[ https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961023#comment-16961023 ]
Istvan Fajth commented on HDFS-14882: ------------------------------------- Hello [~hexiaoqiao], I was checking into the patch, and into the proposal, and I think even though the changes looks cool, and does what it promises as I see, I would have one question/suggestion to consider instead of doing this when the dfs.namenode.read.considerLoad is set to true: In NetworkTopology#sortByDistance, we already sort the nodes by network distance, and there is a shuffle for the nodes on the same level that thrives to ensure some distribution of load. That shuffle can be considered as well as a secondary sorting strategy, which we can inject into that point from outside. If we inject the secondary sorting from the DataNodeManager, then if the read.considerLoad is turned on, we can inject a sorting by transceiver count instead of the shuffle. With this, we can avoid calculating the network distance twice, also we can avoid shuffling then sorting by transceiver count. I am posting a proposal, just to demonstrate what exactly I am thinking about, the JUnit test in patch-008 is passing with it, I haven't tried other tests locally. Please share what do you think about this approach. Also I am happy to have some feedback from you [~ayushtkn] and [~elgoiri] too. > Consider DataNode load when #getBlockLocation > --------------------------------------------- > > Key: HDFS-14882 > URL: https://issues.apache.org/jira/browse/HDFS-14882 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Reporter: Xiaoqiao He > Assignee: Xiaoqiao He > Priority: Major > Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch, > HDFS-14882.003.patch, HDFS-14882.004.patch, HDFS-14882.005.patch, > HDFS-14882.006.patch, HDFS-14882.007.patch, HDFS-14882.008.patch > > > Currently, we consider load of datanode when #chooseTarget for writer, > however not consider it for reader. Thus, the process slot of datanode could > be occupied by #BlockSender for reader, and disk/network will be busy > workload, then meet some slow node exception. IIRC same case is reported > times. Based on the fact, I propose to consider load for reader same as it > did #chooseTarget for writer. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org