This means that the namenode processes a block report once every 2
seconds.
This is an average. In actual fact aren't the DNs all trying to send
them around the same time, since they're started roughly at the same
time. Perhaps a DN's first block report (after registration) should
be sent at something like
(blockInterval + randomInt) % blockInterval
seconds and then its subsequent block reports sent every
blockInterval seconds after its previous successful block report.
On Mar 7, 2007, at 9:59 AM, dhruba borthakur (JIRA) wrote:
DFS Scalability: optimize processing time of block reports
----------------------------------------------------------
Key: HADOOP-1079
URL: https://issues.apache.org/jira/browse/
HADOOP-1079
Project: Hadoop
Issue Type: Bug
Components: dfs
Reporter: dhruba borthakur
I have a cluster that has 1800 datanodes. Each datanode has around
50000 blocks and sends a block report to the namenode once every
hour. This means that the namenode processes a block report once
every 2 seconds. Each block report contains all blocks that the
datanode currently hosts. This makes the namenode compare a huge
number of blocks that practically remains the same between two
consecutive reports. This wastes CPU on the namenode.
The problem becomes worse when the number of datanodes increases.
One proposal is to make succeeding block reports (after a
successful send of a full block report) be incremental. This will
make the namenode process only those blocks that were added/deleted
in the last period.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.