[ 
https://issues.apache.org/jira/browse/HDFS-14171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16728387#comment-16728387
 ] 

maobaolong edited comment on HDFS-14171 at 12/24/18 12:47 PM:
--------------------------------------------------------------

[~kennethlnnn] In my company, we have 10k nodes, we upgrade a cluster from 
Hadoop-2.7.1 to Hadoop-3.2, the performance goes downside seriously.

I think this is the reason. In our cluster, we really config the key 
datanodeThreshold to 0, because we think enough block can exit safe mode 
safety. 

Thank you for your improvement. it is a best way to resolve our performance 
problem.


was (Author: maobaolong):
[~kennethlnnn] In my company, we have 10k nodes, we upgrade a cluster from 
Hadoop-2.7.1 to Hadoop-3.2, the performance goes downside seriously.

I think this is the reason. In our cluster, we really config the key 
datanodeThreshold to 0, because we think enough block can exit safe mode 
safety. 

> Performance improvement in Tailing EditLog
> ------------------------------------------
>
>                 Key: HDFS-14171
>                 URL: https://issues.apache.org/jira/browse/HDFS-14171
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.9.0, 3.0.0-alpha1
>            Reporter: Kenneth Yang
>            Priority: Minor
>         Attachments: HDFS-14171.000.patch
>
>
> Stack:
> {code:java}
> Thread 456 (Edit log tailer):
> State: RUNNABLE
> Blocked count: 1139
> Waited count: 12
> Stack:
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getNumLiveDataNodes(DatanodeManager.java:1259)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManagerSafeMode.areThresholdsMet(BlockManagerSafeMode.java:570)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManagerSafeMode.checkSafeMode(BlockManagerSafeMode.java:213)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManagerSafeMode.adjustBlockTotals(BlockManagerSafeMode.java:265)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.completeBlock(BlockManager.java:1087)
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.forceCompleteBlock(BlockManager.java:1118)
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1126)
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:468)
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:258)
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:161)
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:892)
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:321)
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:410)
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:414)
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
> Thread 455 (pool-16-thread-1):
> {code}
> code:
> {code:java}
> private boolean areThresholdsMet() {
>   assert namesystem.hasWriteLock();
>   int datanodeNum = blockManager.getDatanodeManager().getNumLiveDataNodes();
>   synchronized (this) {
>     return blockSafe >= blockThreshold && datanodeNum >= datanodeThreshold;
>   }
> }
> {code}
> According to the code, each time the method areThresholdsMet() is called, the 
> value of {color:#ff0000}datanodeNum{color} is need to be calculated.  
> However, in the scenario of {color:#ff0000}datanodeThreshold{color} is equal 
> to 0(0 is the default value of the configuration), This expression 
> datanodeNum >= datanodeThreshold always returns true.
> Calling the method {color:#ff0000}getNumLiveDataNodes(){color} is time 
> consuming at a scale of 10,000 datanode clusters. Therefore, we add the 
> judgment condition, and only when the datanodeThreshold is greater than 0, 
> the datanodeNum is calculated, which improves the perfomance greatly.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to