Re: [ceph-users] Real world benefit from SSD Journals for a more read than write cluster

Christian Balzer Sun, 12 Jul 2015 05:50:44 -0700

Hello,

thanks to Lionel for writing pretty much what I was going to, in
particular cache sizes and read-ahead cache allocations.


In addition to this keep in mind that all writes still have to happen
twice per drive, journal and actual OSD. 
So when the cache is to busy to merge writes nicely your HDD IOPS is being
halved again.

Christian

On Sun, 12 Jul 2015 14:33:03 +0200 Lionel Bouton wrote:

> On 07/12/15 05:55, Alex Gorbachev wrote:
> > FWIW. Based on the excellent research by Mark Nelson
> > (http://ceph.com/community/ceph-performance-part-2-write-throughput-without-ssd-journals/)
> > we have dropped SSD journals altogether, and instead went for the
> > battery protected controller writeback cache.
> 
> Note that this has limitations (and the research is nearly 2 years old):
> - the controller writeback caches are relatively small (often less than
> 4GB, 2GB is common on the controller, a small portion is not usable, and
> 10% of the rest is often used for readahead/read cache) and this is
> shared by all of your drives. If your workload is not "write spikes"
> oriented, but nearly constant writes this won't help as you will be
> limited on each OSD by roughly half of the disk IOPS. With journals on
> SSDs when you hit their limit (which is ~5GB of buffer for 10GB journals
> and not <2GB divided by the amount of OSDs per controller), the limit is
> the raw disk IOPS.
> - you *must* make sure the controller is configured to switch to
> write-through when the battery/capacitor fails (or a power failure on
> hardware from the same generation could make you lose all of the OSDs
> connected to them in a single event which means data loss),
> - you should monitor the battery/capacitor status to trigger maintenance
> (and your cluster will slow down while the battery/capacitor is waiting
> for a replacement, you might want to down the associated OSDs depending
> on your cluster configuration). We mostly eliminated this problem by
> replacing the whole chassis of the servers we lease for new generations
> every 2 or 3 years: if you time the hardware replacement to match a
> fresh chassis generation this means fresh capacitors and they shouldn't
> fail you (ours are rated for 3 years).
> 
> We just ordered Intel S3710 SSDs even though we have battery/capacitor
> backed caches on the controllers: the latencies have started to rise
> nevertheless when there are long periods of write intensive activity.
> I'm currently pondering if we should bypass the write-cache for the
> SSDs. The cache is obviously less effective on them and might be more
> useful overall if it is dedicated to the rotating disks. Does anyone
> have test results with cache active/inactive on SSD journals with HP
> Smart Array p420 or p840 controllers?
> 
> Lionel
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian Balzer        Network/Systems Engineer                
ch...@gol.com           Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Real world benefit from SSD Journals for a more read than write cluster

Reply via email to