[ https://issues.apache.org/jira/browse/HDFS-11377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yiqun Lin updated HDFS-11377: ----------------------------- Resolution: Fixed Status: Resolved (was: Patch Available) The remove operation should be safe since the method {{removePendingBlock}} has using {{synchronized}}. The failed test is not related. Committed to trunk and branch-2. Thanks [~zhaoyunjiong] for the contribution and thanks [~manojg] for the review! > Balancer hung due to no available mover threads > ----------------------------------------------- > > Key: HDFS-11377 > URL: https://issues.apache.org/jira/browse/HDFS-11377 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Affects Versions: 2.7.3 > Reporter: yunjiong zhao > Assignee: yunjiong zhao > Fix For: 2.9.0, 3.0.0-alpha3 > > Attachments: HDFS-11377.001.patch, HDFS-11377.002.patch > > > When running balancer on large cluster which have more than 3000 Datanodes, > it might be hung due to "No mover threads available". > The stack trace shows it waiting forever like below. > {code} > "main" #1 prio=5 os_prio=0 tid=0x00007ff6cc014800 nid=0x6b2c waiting on > condition [0x00007ff6d1bad000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1043) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1017) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:981) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:611) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:663) > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:776) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:905) > {code} > In the log, there are lots of WARN about "No mover threads available". > {quote} > 2017-01-26 15:36:40,085 WARN > org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads > available: skip moving blk_13700554102_1112815018180 with size=268435456 from > 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through > 10.115.67.137:50010 > 2017-01-26 15:36:40,085 WARN > org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads > available: skip moving blk_4009558842_1103118359883 with size=268435456 from > 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through > 10.115.67.137:50010 > 2017-01-26 15:36:40,085 WARN > org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads > available: skip moving blk_13881956058_1112996460026 with size=133509566 from > 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 10.115.67.36:50010 > {quote} > What happened here is, when there are no mover threads available, > DDatanode.isPendingQEmpty() will return false, so Balancer hung. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org