If we moved to a scheme where the name node was just given a small
number of blocks with each heartbeat, there would be no reason to not
start reporting blocks immediately, would there? Or the name node to
respond to the heartbeat with the block range it wanted next
heartbeat...
On Apr 3, 2006, at 2:42 PM, Doug Cutting wrote:
Hairong Kuang wrote:
Currently dfs datanodes send heartbeats and getBlockwork requests
to the
namenode at the same frequency (once every 3 seconds) after
certain startup
time. Is there any design reason that we need two seperate
messages instead
of one? I am thinking that if we let a sendHeartbeat request
return the
blocks to be deleted or replicated, we are able to cut the network
traffic
in dfs.
No, that sounds like a reasonable change to me.
The startup delay will be need to be somehow re-implemented.
Perhaps we could simply change this to a timer in the namenode on
startup, so that it waits a while on startup before giving any
blockwork. We might then have issues if, e.g, the namenode's
ethernet cable were yanked for a few minutes. When it is re-
connected, the namenode will start issuing lots of uneeded
replication requests. Having a delay in blockwork at the datanode
each time it establishes a new connection to the namenode solves
that problem. Are there other cases that the current startup
blockwork delay is handling?
Doug