[jira] [Commented] (HDFS-9090) Write hot data on few nodes may cause performance issue
[ https://issues.apache.org/jira/browse/HDFS-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900148#comment-14900148 ] He Tianyi commented on HDFS-9090: - The combination theory makes sense. Thanks guys. Since multiple placement policy is not supported yet, I took the approach that DFSClient add nodes in local rack to {{excludeNodes}} during calls of {{getAdditionalBlock}}. This is ugly but solved the problem right now. I'll further wait for either HDFS-4894 or HDFS-7068 implemented, then use the custom policy without write locality only for these data. > Write hot data on few nodes may cause performance issue > --- > > Key: HDFS-9090 > URL: https://issues.apache.org/jira/browse/HDFS-9090 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: He Tianyi >Assignee: He Tianyi > > (I am not sure whether this should be reported as BUG, feel free to modify > this) > Current block placement policy makes best effort to guarantee first replica > on local node whenever possible. > Consider the following scenario: > 1. There are 500 datanodes across plenty of racks, > 2. Raw user action log (just an example) are being written only on 10 nodes, > which also have datanode deployed locally, > 3. Then, before any balance, all these logs will have at least one replica in > 10 nodes, implying one thirds data read on these log will be served by these > 10 nodes if repl factor is 3, performance suffers. > I propose to solve this scenario by introducing a configuration entry for > client to disable arbitrary level of write locality. > Then we can either (A) add local nodes to excludedNodes, or (B) tell NameNode > the locality we prefer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9090) Write hot data on few nodes may cause performance issue
[ https://issues.apache.org/jira/browse/HDFS-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14877135#comment-14877135 ] Steve Loughran commented on HDFS-9090: -- I'm in favour of multiple placement policies, not options on the current one —with those policies designed to share as much code as they can (i.e. not the way the YARN schedulers currently work). Allowing apps to chose a placement policy when creating an FS client instance would let you deploy different configs for different parts of the cluster; this would tie in well with label-driven placement > Write hot data on few nodes may cause performance issue > --- > > Key: HDFS-9090 > URL: https://issues.apache.org/jira/browse/HDFS-9090 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: He Tianyi >Assignee: He Tianyi > > (I am not sure whether this should be reported as BUG, feel free to modify > this) > Current block placement policy makes best effort to guarantee first replica > on local node whenever possible. > Consider the following scenario: > 1. There are 500 datanodes across plenty of racks, > 2. Raw user action log (just an example) are being written only on 10 nodes, > which also have datanode deployed locally, > 3. Then, before any balance, all these logs will have at least one replica in > 10 nodes, implying one thirds data read on these log will be served by these > 10 nodes if repl factor is 3, performance suffers. > I propose to solve this scenario by introducing a configuration entry for > client to disable arbitrary level of write locality. > Then we can either (A) add local nodes to excludedNodes, or (B) tell NameNode > the locality we prefer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9090) Write hot data on few nodes may cause performance issue
[ https://issues.apache.org/jira/browse/HDFS-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791500#comment-14791500 ] Walter Su commented on HDFS-9090: - bq. Based on that, how about add one parameter, perhaps named "localityLevel" to chooseTarget HDFS-8390 try to do the same thing. But it doesn't worth to complicate default policy if it's not a popular demand. 10 users have 10 special needs, we have 2^10 combinations. It's a disaster to put all of them in default policy. > Write hot data on few nodes may cause performance issue > --- > > Key: HDFS-9090 > URL: https://issues.apache.org/jira/browse/HDFS-9090 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: He Tianyi >Assignee: He Tianyi > > (I am not sure whether this should be reported as BUG, feel free to modify > this) > Current block placement policy makes best effort to guarantee first replica > on local node whenever possible. > Consider the following scenario: > 1. There are 500 datanodes across plenty of racks, > 2. Raw user action log (just an example) are being written only on 10 nodes, > which also have datanode deployed locally, > 3. Then, before any balance, all these logs will have at least one replica in > 10 nodes, implying one thirds data read on these log will be served by these > 10 nodes if repl factor is 3, performance suffers. > I propose to solve this scenario by introducing a configuration entry for > client to disable arbitrary level of write locality. > Then we can either (A) add local nodes to excludedNodes, or (B) tell NameNode > the locality we prefer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9090) Write hot data on few nodes may cause performance issue
[ https://issues.apache.org/jira/browse/HDFS-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791457#comment-14791457 ] He Tianyi commented on HDFS-9090: - Not quite sure but I do think perhaps this is perpendicular with {{BlockPlacementPolicy}}. Assume that HDFS-7068 is implemented. In this case, one can configure {{BlockPlacementPolicy}} for specified INode. It is certain that write operation under particular directory can be enforced to scatter data across the cluster. But, given that {{BlockPlacementPolicy}} focuses on where replica should be located, each identical policy may differentiate to two different versions (with locality, and without). That is, we have {{BlockPlacementPolicyDefault}}, then perhaps we need a {{BlockPlacementPolicyDefaultWithoutWriteLocality}}. And for a real case, we have {{BlockPlacementPolicyWithMultiDC}}, then perhaps we also need a {{BlockPlacementPolicyWithMultiDCWithoutWriteLocality}}. Let alone the latter one could be implemented by just overriding several methods. Based on that, how about add one parameter, perhaps named "localityLevel" to {{chooseTarget}}, then each policy can have their own consideration without having the burden of implement two versions? This could also work when multiple policy is not supported. > Write hot data on few nodes may cause performance issue > --- > > Key: HDFS-9090 > URL: https://issues.apache.org/jira/browse/HDFS-9090 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: He Tianyi >Assignee: He Tianyi > > (I am not sure whether this should be reported as BUG, feel free to modify > this) > Current block placement policy makes best effort to guarantee first replica > on local node whenever possible. > Consider the following scenario: > 1. There are 500 datanodes across plenty of racks, > 2. Raw user action log (just an example) are being written only on 10 nodes, > which also have datanode deployed locally, > 3. Then, before any balance, all these logs will have at least one replica in > 10 nodes, implying one thirds data read on these log will be served by these > 10 nodes if repl factor is 3, performance suffers. > I propose to solve this scenario by introducing a configuration entry for > client to disable arbitrary level of write locality. > Then we can either (A) add local nodes to excludedNodes, or (B) tell NameNode > the locality we prefer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9090) Write hot data on few nodes may cause performance issue
[ https://issues.apache.org/jira/browse/HDFS-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791410#comment-14791410 ] Walter Su commented on HDFS-9090: - bq. The placement policy in the erasure coding branch achieves the goal of spreading the data across racks. It still could burden local racks where the 10 storm nodes locate on. Maybe you can customize a policy. Just override {{chooseTargetInOrder}} with completely {{chooseRandom}}. But it hurts YARN application's performance. HDFS-4894 or HDFS-7068 could be very helpful but not implemented yet. Maybe you should take the advice from [~ste...@apache.org] to use ingest nodes. > Write hot data on few nodes may cause performance issue > --- > > Key: HDFS-9090 > URL: https://issues.apache.org/jira/browse/HDFS-9090 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: He Tianyi >Assignee: He Tianyi > > (I am not sure whether this should be reported as BUG, feel free to modify > this) > Current block placement policy makes best effort to guarantee first replica > on local node whenever possible. > Consider the following scenario: > 1. There are 500 datanodes across plenty of racks, > 2. Raw user action log (just an example) are being written only on 10 nodes, > which also have datanode deployed locally, > 3. Then, before any balance, all these logs will have at least one replica in > 10 nodes, implying one thirds data read on these log will be served by these > 10 nodes if repl factor is 3, performance suffers. > I propose to solve this scenario by introducing a configuration entry for > client to disable arbitrary level of write locality. > Then we can either (A) add local nodes to excludedNodes, or (B) tell NameNode > the locality we prefer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9090) Write hot data on few nodes may cause performance issue
[ https://issues.apache.org/jira/browse/HDFS-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790984#comment-14790984 ] Zhe Zhang commented on HDFS-9090: - [~ste...@apache.org] Good thought. The placement policy in the erasure coding branch achieves the goal of spreading the data across racks. [~walter.k.su] Did the work under HDFS-8186. Right now we are switching the policy on for EC files only. > Write hot data on few nodes may cause performance issue > --- > > Key: HDFS-9090 > URL: https://issues.apache.org/jira/browse/HDFS-9090 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: He Tianyi >Assignee: He Tianyi > > (I am not sure whether this should be reported as BUG, feel free to modify > this) > Current block placement policy makes best effort to guarantee first replica > on local node whenever possible. > Consider the following scenario: > 1. There are 500 datanodes across plenty of racks, > 2. Raw user action log (just an example) are being written only on 10 nodes, > which also have datanode deployed locally, > 3. Then, before any balance, all these logs will have at least one replica in > 10 nodes, implying one thirds data read on these log will be served by these > 10 nodes if repl factor is 3, performance suffers. > I propose to solve this scenario by introducing a configuration entry for > client to disable arbitrary level of write locality. > Then we can either (A) add local nodes to excludedNodes, or (B) tell NameNode > the locality we prefer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9090) Write hot data on few nodes may cause performance issue
[ https://issues.apache.org/jira/browse/HDFS-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790966#comment-14790966 ] Steve Loughran commented on HDFS-9090: -- This deployment makes sense: the traditional placement policy is based on the assumption the writer of the data may want to read it again, so leaves a copy close. That doesn't hold if its something that does want to spread its data across the racks —with the expectation that other work will read it in. It doesn't just hurt disk usage, it would bias YARN workloads to run on those nodes, or at least the same rack. > Write hot data on few nodes may cause performance issue > --- > > Key: HDFS-9090 > URL: https://issues.apache.org/jira/browse/HDFS-9090 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: He Tianyi >Assignee: He Tianyi > > (I am not sure whether this should be reported as BUG, feel free to modify > this) > Current block placement policy makes best effort to guarantee first replica > on local node whenever possible. > Consider the following scenario: > 1. There are 500 datanodes across plenty of racks, > 2. Raw user action log (just an example) are being written only on 10 nodes, > which also have datanode deployed locally, > 3. Then, before any balance, all these logs will have at least one replica in > 10 nodes, implying one thirds data read on these log will be served by these > 10 nodes if repl factor is 3, performance suffers. > I propose to solve this scenario by introducing a configuration entry for > client to disable arbitrary level of write locality. > Then we can either (A) add local nodes to excludedNodes, or (B) tell NameNode > the locality we prefer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9090) Write hot data on few nodes may cause performance issue
[ https://issues.apache.org/jira/browse/HDFS-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790745#comment-14790745 ] Zhe Zhang commented on HDFS-9090: - This could also be addressed through multiple / customizable placement policies: HDFS-4894 and HDFS-7068 > Write hot data on few nodes may cause performance issue > --- > > Key: HDFS-9090 > URL: https://issues.apache.org/jira/browse/HDFS-9090 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: He Tianyi >Assignee: He Tianyi > > (I am not sure whether this should be reported as BUG, feel free to modify > this) > Current block placement policy makes best effort to guarantee first replica > on local node whenever possible. > Consider the following scenario: > 1. There are 500 datanodes across plenty of racks, > 2. Raw user action log (just an example) are being written only on 10 nodes, > which also have datanode deployed locally, > 3. Then, before any balance, all these logs will have at least one replica in > 10 nodes, implying one thirds data read on these log will be served by these > 10 nodes if repl factor is 3, performance suffers. > I propose to solve this scenario by introducing a configuration entry for > client to disable arbitrary level of write locality. > Then we can either (A) add local nodes to excludedNodes, or (B) tell NameNode > the locality we prefer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9090) Write hot data on few nodes may cause performance issue
[ https://issues.apache.org/jira/browse/HDFS-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14769041#comment-14769041 ] He Tianyi commented on HDFS-9090: - Alternatively, from a different perspective, perhaps we can consider add a {{considerLoad}} for getBlockLocations either, which affects sort weight of overloaded data nodes. This is similar to {{considerLoad}} during write. > Write hot data on few nodes may cause performance issue > --- > > Key: HDFS-9090 > URL: https://issues.apache.org/jira/browse/HDFS-9090 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: He Tianyi >Assignee: He Tianyi > > (I am not sure whether this should be reported as BUG, feel free to modify > this) > Current block placement policy makes best effort to guarantee first replica > on local node whenever possible. > Consider the following scenario: > 1. There are 500 datanodes across plenty of racks, > 2. Raw user action log (just an example) are being written only on 10 nodes, > which also have datanode deployed locally, > 3. Then, before any balance, all these logs will have at least one replica in > 10 nodes, implying one thirds data read on these log will be served by these > 10 nodes if repl factor is 3, performance suffers. > I propose to solve this scenario by introducing a configuration entry for > client to disable arbitrary level of write locality. > Then we can either (A) add local nodes to excludedNodes, or (B) tell NameNode > the locality we prefer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9090) Write hot data on few nodes may cause performance issue
[ https://issues.apache.org/jira/browse/HDFS-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14768973#comment-14768973 ] He Tianyi commented on HDFS-9090: - Thanks, [~ste...@apache.org]. My case may be a little rare. Actually these writer nodes have Storm deployed and it is storm jobs that feed HDFS with logs. And due to cost control and budget cycle, it is natural to deploy DataNode on every machine that has enough hardware resource. (Otherwise it would be a waste to keep hard disks of 'ingest nodes' almost empty) IMHO perhaps this could be a common scenario for medium-sized startups. > Write hot data on few nodes may cause performance issue > --- > > Key: HDFS-9090 > URL: https://issues.apache.org/jira/browse/HDFS-9090 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: He Tianyi >Assignee: He Tianyi > > (I am not sure whether this should be reported as BUG, feel free to modify > this) > Current block placement policy makes best effort to guarantee first replica > on local node whenever possible. > Consider the following scenario: > 1. There are 500 datanodes across plenty of racks, > 2. Raw user action log (just an example) are being written only on 10 nodes, > which also have datanode deployed locally, > 3. Then, before any balance, all these logs will have at least one replica in > 10 nodes, implying one thirds data read on these log will be served by these > 10 nodes if repl factor is 3, performance suffers. > I propose to solve this scenario by introducing a configuration entry for > client to disable arbitrary level of write locality. > Then we can either (A) add local nodes to excludedNodes, or (B) tell NameNode > the locality we prefer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9090) Write hot data on few nodes may cause performance issue
[ https://issues.apache.org/jira/browse/HDFS-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14747311#comment-14747311 ] Steve Loughran commented on HDFS-9090: -- moved to improvement. Generally work is balanced enough across a cluster that you don't get overloaded nodes; presumably this is a deployment where you have some servers dedicated to a specific role, rather than having YARN place it wherever it chooses. FWIW, a lot of Hadoop clusters have 'edge nodes'/'ingest nodes' that (a) don't have any local DN and (b) are sometimes hooked straight to the toplevel switch at 10 Gb/s, so get great performance scattering data across the cluster > Write hot data on few nodes may cause performance issue > --- > > Key: HDFS-9090 > URL: https://issues.apache.org/jira/browse/HDFS-9090 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: He Tianyi >Assignee: He Tianyi > > (I am not sure whether this should be reported as BUG, feel free to modify > this) > Current block placement policy makes best effort to guarantee first replica > on local node whenever possible. > Consider the following scenario: > 1. There are 500 datanodes across plenty of racks, > 2. Raw user action log (just an example) are being written only on 10 nodes, > which also have datanode deployed locally, > 3. Then, before any balance, all these logs will have at least one replica in > 10 nodes, implying one thirds data read on these log will be served by these > 10 nodes if repl factor is 3, performance suffers. > I propose to solve this scenario by introducing a configuration entry for > client to disable arbitrary level of write locality. > Then we can either (A) add local nodes to excludedNodes, or (B) tell NameNode > the locality we prefer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)