[ https://issues.apache.org/jira/browse/HDFS-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791457#comment-14791457 ]
He Tianyi commented on HDFS-9090: --------------------------------- Not quite sure but I do think perhaps this is perpendicular with {{BlockPlacementPolicy}}. Assume that HDFS-7068 is implemented. In this case, one can configure {{BlockPlacementPolicy}} for specified INode. It is certain that write operation under particular directory can be enforced to scatter data across the cluster. But, given that {{BlockPlacementPolicy}} focuses on where replica should be located, each identical policy may differentiate to two different versions (with locality, and without). That is, we have {{BlockPlacementPolicyDefault}}, then perhaps we need a {{BlockPlacementPolicyDefaultWithoutWriteLocality}}. And for a real case, we have {{BlockPlacementPolicyWithMultiDC}}, then perhaps we also need a {{BlockPlacementPolicyWithMultiDCWithoutWriteLocality}}. Let alone the latter one could be implemented by just overriding several methods. Based on that, how about add one parameter, perhaps named "localityLevel" to {{chooseTarget}}, then each policy can have their own consideration without having the burden of implement two versions? This could also work when multiple policy is not supported. > Write hot data on few nodes may cause performance issue > ------------------------------------------------------- > > Key: HDFS-9090 > URL: https://issues.apache.org/jira/browse/HDFS-9090 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 2.3.0 > Reporter: He Tianyi > Assignee: He Tianyi > > (I am not sure whether this should be reported as BUG, feel free to modify > this) > Current block placement policy makes best effort to guarantee first replica > on local node whenever possible. > Consider the following scenario: > 1. There are 500 datanodes across plenty of racks, > 2. Raw user action log (just an example) are being written only on 10 nodes, > which also have datanode deployed locally, > 3. Then, before any balance, all these logs will have at least one replica in > 10 nodes, implying one thirds data read on these log will be served by these > 10 nodes if repl factor is 3, performance suffers. > I propose to solve this scenario by introducing a configuration entry for > client to disable arbitrary level of write locality. > Then we can either (A) add local nodes to excludedNodes, or (B) tell NameNode > the locality we prefer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)