[ 
https://issues.apache.org/jira/browse/HDFS-11377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11377:
-----------------------------
    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

The remove operation should be safe since the method {{removePendingBlock}} has 
using {{synchronized}}. The failed test is not related. Committed to trunk and 
branch-2. Thanks [~zhaoyunjiong] for the contribution and thanks [~manojg] for 
the review!

> Balancer hung due to no available mover threads
> -----------------------------------------------
>
>                 Key: HDFS-11377
>                 URL: https://issues.apache.org/jira/browse/HDFS-11377
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer & mover
>    Affects Versions: 2.7.3
>            Reporter: yunjiong zhao
>            Assignee: yunjiong zhao
>             Fix For: 2.9.0, 3.0.0-alpha3
>
>         Attachments: HDFS-11377.001.patch, HDFS-11377.002.patch
>
>
> When running balancer on large cluster which have more than 3000 Datanodes, 
> it might be hung due to "No mover threads available".
> The stack trace shows it waiting forever like below.
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x00007ff6cc014800 nid=0x6b2c waiting on 
> condition [0x00007ff6d1bad000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1043)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1017)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:981)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:611)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:663)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:776)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:905)
> {code}
> In the log, there are lots of WARN about "No mover threads available".
> {quote}
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_13700554102_1112815018180 with size=268435456 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 
> 10.115.67.137:50010
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_4009558842_1103118359883 with size=268435456 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 
> 10.115.67.137:50010
> 2017-01-26 15:36:40,085 WARN 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover threads 
> available: skip moving blk_13881956058_1112996460026 with size=133509566 from 
> 10.115.67.137:50010:DISK to 10.140.21.55:50010:DISK through 10.115.67.36:50010
> {quote}
> What happened here is, when there are no mover threads available, 
> DDatanode.isPendingQEmpty() will return false, so Balancer hung.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to