[ https://issues.apache.org/jira/browse/HDFS-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479304#comment-13479304 ]
Kihwal Lee commented on HDFS-4075: ---------------------------------- We had a group of 40 nodes that were decommissioned then recommissioned. When they got recommissioned by refreshing nodes using dfsadmin, there were over 5M over-replicated blocks, so holding the write lock the NN (RPC handler) went through each of them and generated two log messages per block. That took about 5 minutes and over 2GB of log were written. Because of the locking, the namenode was unresponsive for the whole time. I tested the commons-logging + log4j FileAppender family combination for its performance and it was clear that the above case was hitting the logging bottleneck. When comparing logging a single character vs. 400 bytes, time to finish logging 1,000,000 messages didn't seem much different. It was not IO bound, but CPU bound as the CPU stayed 100% the whole time. Changing FileAppender properties affected the timing a bit but not a lot. It seems this is the inherent limit of this logging mechanism. For a single character logging, each message took 19-23us. Or it could do about 42K logs/sec with CPU at 100%, almost no IO wait time. We can see that the namenode in the case given above were spending almost all of its time logging. The IO overhead was not significant. > Reduce recommissioning overhead > ------------------------------- > > Key: HDFS-4075 > URL: https://issues.apache.org/jira/browse/HDFS-4075 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.23.4, 2.0.2-alpha > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Priority: Critical > > When datanodes are recommissioned, > {BlockManager#processOverReplicatedBlocksOnReCommission()} is called for each > rejoined node and excess blocks are added to the invalidate list. The problem > is this is done while the namesystem write lock is held. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira