[ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856139#action_12856139 ]
Brian Bockelman commented on HDFS-1094: --------------------------------------- Hey Karthik, Let me play dumb (it might not be playing after all) and try to work out the math a bit. First, let's assume that on any given day, a node has 1/1000 chance of failing. CURRENT SCHEME: A block is on 3 random nodes. Probability of loss is a simultaneous failure of nodes X, Y, Z. Let's assume these are independent. P(X and Y and Z) = P(X) P(Y) P(Z) = 1 in a billion. PROPOSED SCHEME: Well, the probability is the same. So, given a specific block, we don't change the probability it is lost. What you seem to be calculating is the probability that three nodes go down out of N nodes: P(nodes X, Y, and Z fail for any three distinct X, Y, Z) = 1 - P(N-3 nodes stay up) = 1 - [999/1000]^[N-3] Sure enough, if you use a small subset (N=40 maybe), then the probability of 3 nodes failing is smaller for small subsets than the whole cluster. However, that's not the number you want! You want the probability that *any* block is lost when three nodes go down. That is, P(nodes X, Y, and Z fail for any three distinct X, Y, Z and X, Y, Z have at least one distinct block) (call this P_1). Assuming that overlapping blocks, node death, and subset of nodes are all independent, you get: P_1 = P(three nodes having at least one common block) * P(3 node death) * (# of distinct 3-node subsets) The first number is decreasing with N, the second is constant with N, the third is increasing with N. The third is a well-known formula, while I don't have a good formula for the first value. Unless you can calculate or estimate the first, I don't think you can really say anything about decreasing the value of P_1. I *think* we are incorrectly assuming the probability of data loss as being proportional to to the probability of 3 machines in a subset being lost without taking into account the probability of common blocks. The probabilities get tricky, hence me asking for someone to sketch it out mathematically... > Intelligent block placement policy to decrease probability of block loss > ------------------------------------------------------------------------ > > Key: HDFS-1094 > URL: https://issues.apache.org/jira/browse/HDFS-1094 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Reporter: dhruba borthakur > Assignee: dhruba borthakur > > The current HDFS implementation specifies that the first replica is local and > the other two replicas are on any two random nodes on a random remote rack. > This means that if any three datanodes die together, then there is a > non-trivial probability of losing at least one block in the cluster. This > JIRA is to discuss if there is a better algorithm that can lower probability > of losing a block. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira