[ceph-users] risk mitigation in 2 replica clusters

Blair Bethwaite Wed, 21 Jun 2017 07:53:43 -0700

Hi all,

I'm doing some work to evaluate the risks involved in running 2r storage
pools. On the face of it my naive disk failure calculations give me 4-5
nines for a 2r pool of 100 OSDs (no copyset awareness, i.e., secondary disk
failure based purely on chance of any 1 of the remaining 99 OSDs failing
within recovery time). 5 nines is just fine for our purposes, but of course
multiple disk failures are only part of the story.


The more problematic issue with 2r clusters is that any time you do planned
maintenance (our clusters spend much more time degraded because of regular
upkeep than because of real failures) you're suddenly drastically
increasing the risk of data-loss. So I find myself wondering if there is a
way to tell Ceph I want an extra replica created for a particular PG or set
thereof, e.g., something that would enable the functional equivalent of:
"this OSD/node is going to go offline so please create a 3rd replica in
every PG it is participating in before we shutdown that/those OSD/s"...?

-- 
Cheers,
~Blairo

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] risk mitigation in 2 replica clusters

Reply via email to