[ https://issues.apache.org/jira/browse/HDFS-11015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15584179#comment-15584179 ]
Zhe Zhang commented on HDFS-11015: ---------------------------------- Thanks for the patch Kihwal. A while ago I reported a similar issue HDFS-10977. > Enforce timeout in balancer > --------------------------- > > Key: HDFS-11015 > URL: https://issues.apache.org/jira/browse/HDFS-11015 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Attachments: HDFS-11015-1.patch > > > 1) Hung node detection: HDFS-6247 has removed the socket read timeout while > adding the periodic response for slow block moves. However, the removal of > the long timeout wasn't necessary. The timeout is still useful for avoiding > hung nodes and does not abort slow moves. > 2) Enforcing the iteration limit:The 20 minute iteration limit is supposed to > be enforced, but it is not. An iteration can easily stretch to 30 to 40 > minutes with a long tail. Because of the long tails, the balancer throughput > does not reach its full potential. > 3) Slow move detection: For improved throughput, imposing block move timeout > is sometimes necessary. We have seen an iteration taking over 2 hours > because of one slow block move. This is mainly for catching exceptionally > slow moves. Even if the balancer stops waiting, the move will continue and > finish. > In order to not undo what HDFS-6247 tried to achieve, it should be possible > to configure off 3). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org