[
https://issues.apache.org/jira/browse/HADOOP-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630713#action_12630713
]
Hairong Kuang commented on HADOOP-4116:
---------------------------------------
Proposed changes to the Balancer:
1. Remove the use of Semaphor at DataNodes. Instead a DataNode uses a counter
to manages the number of concurrent block moves. On receiving a block move
request while maximum block moves are in progress, reject the request
immediately.
2. Let the receiver initiate the block move; The sender rejects the request
when the maximum number has already reached. As a result when either the sender
or the receiver does not have resource to handle block move, the block content
will not get transfered across network.
3. The balancer does not set a timeout on a socket. Instead, it sets the option
KeepAlive on the socket. So a block move does not timeout no matter how slow it
goes and next phrase of scheduling does not get started when there is a pending
block move.
> Balancer should provide better resource management
> --------------------------------------------------
>
> Key: HADOOP-4116
> URL: https://issues.apache.org/jira/browse/HADOOP-4116
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.17.0
> Reporter: Raghu Angadi
> Assignee: Hairong Kuang
>
> The number of threads are currently limited on datanodes. Once these threads
> are occupied, DataNode does not accept any more requests (DOS). Recently we
> saw a case where most of the 256 threads were waiting in
> {{DataXceiver.replaceBlock()}} trying to acquire {{balancingSem}}. Since
> rebalancing is (heavily) throttled, I would think this would be the common
> case.
> These operations waiting for active rebalancing threads to finish need not
> take up a thread.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.