[ https://issues.apache.org/jira/browse/HDFS-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058838#comment-13058838 ]
Tomasz Nykiel commented on HDFS-395: ------------------------------------ Actually, this is a good question. I was assuming that the rename would be relatively cheap. I guess, I will need to do some testing. > DFS Scalability: Incremental block reports > ------------------------------------------ > > Key: HDFS-395 > URL: https://issues.apache.org/jira/browse/HDFS-395 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Attachments: blockReportPeriod.patch, explicitDeleteAcks.patch > > > I have a cluster that has 1800 datanodes. Each datanode has around 50000 > blocks and sends a block report to the namenode once every hour. This means > that the namenode processes a block report once every 2 seconds. Each block > report contains all blocks that the datanode currently hosts. This makes the > namenode compare a huge number of blocks that practically remains the same > between two consecutive reports. This wastes CPU on the namenode. > The problem becomes worse when the number of datanodes increases. > One proposal is to make succeeding block reports (after a successful send of > a full block report) be incremental. This will make the namenode process only > those blocks that were added/deleted in the last period. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira