[ceph-users] SSD journal deployment experiences

Dan Van Der Ster Thu, 04 Sep 2014 09:22:17 -0700

Dear Cephalopods,

In a few weeks we will receive a batch of 200GB Intel DC S3700’s to augment our 
cluster, and I’d like to hear your practical experience and discuss options how 
best to deploy these.


We’ll be able to equip each of our 24-disk OSD servers with 4 SSDs, so they 
will become 20 OSDs + 4 SSDs per server. Until recently I’ve been planning to 
use the traditional deployment: 5 journal partitions per SSD. But as SSD-day 
approaches, I growing less comfortable with the idea of 5 OSDs going down every 
time an SSD fails, so perhaps there are better options out there.

Before getting into options, I’m curious about real reliability of these drives:

1) How often are DC S3700's failing in your deployments?
2) If you have SSD journals at a ratio of 1 to 4 or 5, how painful is the 
backfilling which results from an SSD failure? Have you considered tricks like 
increasing the down out interval so backfilling doesn’t happen in this case 
(leaving time for the SSD to be replaced)?

Beyond the usually 5 partitions deployment, is anyone running a RAID1 or RAID10 
for the journals? If so, are you using the raw block devices or formatting it 
and storing the journals as files on the SSD array(s)? Recent discussions seem 
to indicate that XFS is just as fast as the block dev, since these drives are 
so fast.

Next, I wonder how people with puppet/chef/… are handling the 
creation/re-creation of the SSD devices. Are you just wiping and rebuilding all 
the dependent OSDs completely when the journal dev fails? I’m not keen on 
puppetizing the re-creation of journals for OSDs... 

We also have this crazy idea of failing over to a local journal file in case an 
SSD fails. In this model, when an SSD fails we’d quickly create a new journal 
either on another SSD or on the local OSD filesystem, then restart the OSDs 
before backfilling started. Thoughts?

Lastly, I would also consider using 2 of the SSDs in a data pool (with the 
other 2 SSDs to hold 20 journals — probably in a RAID1 to avoid backfilling 10 
OSDs when an SSD fails). If the 10-1 ratio of SSDs would perform adequately, 
that’d give us quite a few SSDs to build a dedicated high-IOPS pool.

I’d also appreciate any other suggestions/experiences which might be relevant.

Thanks!
Dan

-- Dan van der Ster || Data & Storage Services || CERN IT Department --


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] SSD journal deployment experiences

Reply via email to