Also, see the balancer tool that comes with Hadoop. This background process should be run periodically (Every week or so?) to make sure that data's evenly distributed.
http://hadoop.apache.org/core/docs/r0.19.0/hdfs_user_guide.html#Rebalancer - Aaron On Sat, Jan 24, 2009 at 7:40 PM, jason hadoop <[email protected]>wrote: > The blocks will be invalidated on the returned to service datanode. > If you want to save your namenode and network a lot of work, wipe the hdfs > block storage directory before returning the Datanode to service. > dfs.data.dir will be the directory, most likley the value is > ${hadoop.tmp.dir}/dfs/data > > Jason - Ex Attributor > > On Sat, Jan 24, 2009 at 6:19 PM, C G <[email protected]> wrote: > > > Hi All: > > > > I elected to take a node out of one of our grids for service. Naturally > > HDFS recognized the loss of the DataNode and did the right stuff, fixing > > replication issues and ultimately delivering a clean file system. > > > > So now the node I removed is ready to go back in service. When I return > it > > to service a bunch of files will suddenly have a replication of 4 instead > of > > 3. My questions: > > > > 1. Will HDFS delete a copy of the data to bring replication back to 3? > > 2. If (1) above is yes, will it remove the copy by deleting from other > > nodes, or will it remove files from the returned node, or both? > > > > The motivation for asking the questions are that I have a file system > which > > is extremely unbalanced - we recently doubled the size of the grid when a > > few dozen terabytes already stored on the existing nodes. I am wondering > if > > an easy way to restore some sense of balance is to cycle through the old > > nodes, removing each one from service for several hours and then return > it > > to service. > > > > Thoughts? > > > > Thanks in Advance, > > C G > > > > > > > > > > > > > > >
