[ https://issues.apache.org/jira/browse/HDFS-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900148#comment-14900148 ]
He Tianyi commented on HDFS-9090: --------------------------------- The combination theory makes sense. Thanks guys. Since multiple placement policy is not supported yet, I took the approach that DFSClient add nodes in local rack to {{excludeNodes}} during calls of {{getAdditionalBlock}}. This is ugly but solved the problem right now. I'll further wait for either HDFS-4894 or HDFS-7068 implemented, then use the custom policy without write locality only for these data. > Write hot data on few nodes may cause performance issue > ------------------------------------------------------- > > Key: HDFS-9090 > URL: https://issues.apache.org/jira/browse/HDFS-9090 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 2.3.0 > Reporter: He Tianyi > Assignee: He Tianyi > > (I am not sure whether this should be reported as BUG, feel free to modify > this) > Current block placement policy makes best effort to guarantee first replica > on local node whenever possible. > Consider the following scenario: > 1. There are 500 datanodes across plenty of racks, > 2. Raw user action log (just an example) are being written only on 10 nodes, > which also have datanode deployed locally, > 3. Then, before any balance, all these logs will have at least one replica in > 10 nodes, implying one thirds data read on these log will be served by these > 10 nodes if repl factor is 3, performance suffers. > I propose to solve this scenario by introducing a configuration entry for > client to disable arbitrary level of write locality. > Then we can either (A) add local nodes to excludedNodes, or (B) tell NameNode > the locality we prefer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)