[ 
https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382189#comment-16382189
 ] 

Erik Krogen commented on HDFS-13183:
------------------------------------

I'm not sure that a specific new exception just for this situation is the right 
move. I think ideally, the client (in this case the Balancer) should be able to 
make the decision rather than the NN. For example, if the SbNN goes down, the 
ANN is not aware of this, but the balancer should start to read from the ANN 
instead of SbNN. The current approach is not able to handle such a situation. 
The current handling may work as an interim solution until we develop out 
HDFS-12976, but in that case I would rather reuse {{StandbyException}} and just 
update its comment rather than creating a new class of exception. This has 
better compatibility as well. Ping [~shv] for an opinion on this approach.

Additional comments on the patch:
* I realized that changing {{checkOperation}} to {{UNCHECKED}} in all cases is 
wrong as that will allow {{getBlocks}} to be performed against the SbNN even if 
the new config is disabled. For now the only thing that comes to mind is to do 
something like {{checkOperation(balancerShouldRequestStandby ? UNCHECKED : 
READ)}}, but I'm not too fond of it. Open to better ideas. It may be that we 
want to create a new {{OperationCategory.STANDBY_READ}} and then use 
{{checkOperation(balancerShouldRequestStandby ? STANDBY_READ : READ)}}; this 
could do away with the explicit check of the service state
* In the test, we should confirm that the balancer actually fails over to the 
SbNN, and that it is able to appropriately get blocks and trigger data movement 
as a result.

> Standby NameNode process getBlocks request to reduce Active load
> ----------------------------------------------------------------
>
>                 Key: HDFS-13183
>                 URL: https://issues.apache.org/jira/browse/HDFS-13183
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer & mover, namenode
>    Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2
>            Reporter: He Xiaoqiao
>            Assignee: He Xiaoqiao
>            Priority: Major
>         Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, 
> HDFS-13183-trunk.003.patch
>
>
> The performance of Active NameNode could be impact when {{Balancer}} requests 
> #getBlocks, since query blocks of overly full DNs performance is extremely 
> inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} 
> hold read lock for long time. In extreme case, all handlers of Active 
> NameNode RPC server are occupied by one reader 
> {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active 
> NameNode enter a state of false death for number of seconds even for minutes.
> The similar performance concerns of Balancer have reported by HDFS-9412, 
> HDFS-7967, etc.
> If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up 
> the progress of balancing and reduce performance impact to Active NameNode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to