On 31/03/2019 17.56, Christian Balzer wrote:
Am I correct that unlike with with replication there isn't a maximum size
of the critical path OSDs?

As far as I know, the math for calculating the probability of data loss wrt placement groups is the same for EC and for replication. Replication to n copies should be equivalent to EC with k=1 and m=(n-1).

Meaning that with replication x3 and typical values of 100 PGs per OSD at
most 300 OSDs form a set out of which 3 OSDs need to fail for data loss.
The statistical likelihood for that based on some assumptions
is significant, but not nightmarishly so.
A cluster with 1500 OSDs in total is thus as susceptible as one with just
300.
Meaning that 3 disk losses in the big cluster don't necessarily mean data
loss at all.

Someone might correct me on this, but here's my take on the math.

If you have 100 PGs per OSD, 1500 OSDs, and replication 3, you have:

1500 * 100 / 3 = 50000 pool PGs, and thus 50000 (hopefully) different 3-sets of OSDs.

(1500 choose 3) = 561375500 possible sets of 3 OSDs

Therefore if you lose 3 random OSDs, your chance of (any) data loss is 50000/561375500 = ~0.008%. (and if you *do* get unlucky and hit the wrong set of 3 OSDs, you can expect to lose 1/50000 = ~0.002% of your data)

However it feels that with EC all OSDs can essentially be in the same set
and thus having 6 out of 1500 OSDs fail in a 10+5 EC pool with 100 PGs per
OSD would affect every last object in that cluster, not just a subset.

The math should work essentially the same way:

1500 * 100 / 15 = 10000 15-sets of OSDs

(1500 choose 15) = 3.1215495e+35 possible 15-sets of OSDs

Now if 6 OSDs fail that will affect many potential 15-sets of OSDs chosen with the remaining OSD in the cluster:

((1500 - 6) choose 9) = 9.9748762e+22

Putting it together, the chance of any data loss from a simultaneous loss of 6 random OSDs:

10000 / 3.1215495e+35 * 9.9748762e+22 = 0.00000032%

And if you *do* get unlucky you can expect to lose 1/10000 = ~0.01% of your data.

So your chance of data loss is much smaller with such a wide EC encoding, but if you do lose a PG you'll lose more data because there are fewer PGs.

Feedback on my math welcome.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to