[ https://issues.apache.org/jira/browse/HDFS-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885848#action_12885848 ]
Hairong Kuang commented on HDFS-1094: ------------------------------------- Rodrigo, thanks for your explanation. Now I understand your proposal much better. > We assume machine failures are independent. I am not sure if I agree with this assumption. From the attached analysis, the in-rack placement has the lowest data loss probability. This is counter-intuitive. in reality, the chance of losing a rack is not small. So a block placement policy normally place a block in at least two racks. > Intelligent block placement policy to decrease probability of block loss > ------------------------------------------------------------------------ > > Key: HDFS-1094 > URL: https://issues.apache.org/jira/browse/HDFS-1094 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Reporter: dhruba borthakur > Assignee: Rodrigo Schmidt > Attachments: prob.pdf, prob.pdf > > > The current HDFS implementation specifies that the first replica is local and > the other two replicas are on any two random nodes on a random remote rack. > This means that if any three datanodes die together, then there is a > non-trivial probability of losing at least one block in the cluster. This > JIRA is to discuss if there is a better algorithm that can lower probability > of losing a block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.