[jira] Commented: (HADOOP-4116) DataNode : idle rebalancing operations need not take up threads.

Hairong Kuang (JIRA) Fri, 12 Sep 2008 11:29:11 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630647#action_12630647
 ]


Hairong Kuang commented on HADOOP-4116:
---------------------------------------

The more close investigation of the problem shows the balancer needs additional 
improvements:

(1) The balancer needs to better handle block move timeout well. Currently it 
simply assumes that
the timeouted move is failed but does not take the effort to make sure the move 
is interrupted and the resources the
move takes is released. The next phase of scheduling may schedule more blocks 
to move from the same DataNode thus using
more and more resources.

(2) Resource control for the balancing purpose at DataNodes should use a fair 
Semaphore. Currently
it uses an unfair Semaphore that makes no guarantees about the order in which 
threads acquire permits. A
thread invoking acquire() can be allocated a permit ahead of a thread that has 
been waiting. Therefore, if a dfs
cluster has many DataNodes that has a long queue of block move requests, it is 
very likely to enter the
following state: A thread in DataNode A holding a permit and asks DataNode B to 
receive a block, while DataNode B has a
thread holding a Semaphore and asking DataNode A to receive a block. Although 
the block move from B to A was scheduled
much later than the move from A to B, they may be executed simultaneously. Both 
block receives are blocks on acquiring
a permit assuming only one permit can be issued. Therefore, a deadlock occurs.

> DataNode : idle rebalancing operations need not take up threads.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-4116
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4116
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Raghu Angadi
>
> The number of threads are currently limited on datanodes. Once these threads 
> are occupied, DataNode does not accept any more requests (DOS). Recently we 
> saw a case where most of the 256 threads were waiting in 
> {{DataXceiver.replaceBlock()}} trying to acquire  {{balancingSem}}.  Since 
> rebalancing  is (heavily) throttled, I would think this would be the common 
> case. 
> These operations waiting  for active rebalancing threads to finish need not 
> take up a thread. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4116) DataNode : idle rebalancing operations need not take up threads.

Reply via email to