I think it is better to implement the start-up delay at the namenode. But the key is that the name node should be able to tell if it is in a steady state or not either at start-up time or at runtime after a network disruption. It should not instruct datanodes to replicate or delete any blocks before it has reached a steady state.
Hairong -----Original Message----- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 04, 2006 9:58 AM To: [email protected] Subject: Re: dfs datanode heartbeats and getBlockwork requests Eric Baldeschwieler wrote: > If we moved to a scheme where the name node was just given a small > number of blocks with each heartbeat, there would be no reason to not > start reporting blocks immediately, would there? There would still be a small storm of un-needed replications on startup. Say it takes a minute at startup for all data nodes to report their complete block lists to the name node. If heartbeats are every 3 seconds, then all but the last data node to report in would be handed 20 small lists of blocks to start replicating. And the switches could be saturated doing a lot of un-needed transfers, which would slow startup. Then, for the next minute after startup, the nodes would be told to delete blocks that are now over-replicated. We'd like startup to be as fast and painless as possible. Waiting a bit before checking to see if blocks are over- or under-replicated seems a good way. Doug
