[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382189#comment-16382189 ]
Erik Krogen commented on HDFS-13183: ------------------------------------ I'm not sure that a specific new exception just for this situation is the right move. I think ideally, the client (in this case the Balancer) should be able to make the decision rather than the NN. For example, if the SbNN goes down, the ANN is not aware of this, but the balancer should start to read from the ANN instead of SbNN. The current approach is not able to handle such a situation. The current handling may work as an interim solution until we develop out HDFS-12976, but in that case I would rather reuse {{StandbyException}} and just update its comment rather than creating a new class of exception. This has better compatibility as well. Ping [~shv] for an opinion on this approach. Additional comments on the patch: * I realized that changing {{checkOperation}} to {{UNCHECKED}} in all cases is wrong as that will allow {{getBlocks}} to be performed against the SbNN even if the new config is disabled. For now the only thing that comes to mind is to do something like {{checkOperation(balancerShouldRequestStandby ? UNCHECKED : READ)}}, but I'm not too fond of it. Open to better ideas. It may be that we want to create a new {{OperationCategory.STANDBY_READ}} and then use {{checkOperation(balancerShouldRequestStandby ? STANDBY_READ : READ)}}; this could do away with the explicit check of the service state * In the test, we should confirm that the balancer actually fails over to the SbNN, and that it is able to appropriately get blocks and trigger data movement as a result. > Standby NameNode process getBlocks request to reduce Active load > ---------------------------------------------------------------- > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover, namenode > Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2 > Reporter: He Xiaoqiao > Assignee: He Xiaoqiao > Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org