Re: [jira] Created: (HADOOP-1079) DFS Scalability: optimize processing time of block reports

Nigel Daley Wed, 07 Mar 2007 19:08:11 -0800

This means that the namenode processes a block report once every 2seconds.

This is an average. In actual fact aren't the DNs all trying to sendthem around the same time, since they're started roughly at the sametime. Perhaps a DN's first block report (after registration) shouldbe sent at something like


  (blockInterval + randomInt) % blockInterval

seconds and then its subsequent block reports sent everyblockInterval seconds after its previous successful block report.



On Mar 7, 2007, at 9:59 AM, dhruba borthakur (JIRA) wrote:

DFS Scalability: optimize processing time of block reports
----------------------------------------------------------

                 Key: HADOOP-1079
URL: https://issues.apache.org/jira/browse/HADOOP-1079
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
            Reporter: dhruba borthakur
I have a cluster that has 1800 datanodes. Each datanode has around50000 blocks and sends a block report to the namenode once everyhour. This means that the namenode processes a block report onceevery 2 seconds. Each block report contains all blocks that thedatanode currently hosts. This makes the namenode compare a hugenumber of blocks that practically remains the same between twoconsecutive reports. This wastes CPU on the namenode.
The problem becomes worse when the number of datanodes increases.
One proposal is to make succeeding block reports (after asuccessful send of a full block report) be incremental. This willmake the namenode process only those blocks that were added/deletedin the last period.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Created: (HADOOP-1079) DFS Scalability: optimize processing time of block reports

Reply via email to