Unless something has changed recently it won't automatically relocate
the blocks.  When I did something similar I had a script that walked
through the whole set of files that were misreplicated and increased
the replication factor then dropped it back down.  This triggered
relocation of blocks to meet the rack requirements.

Doing this worked, but took about a week to run over a few hundred
thousand files that were misreplicated.

Here's the script I used (all sorts of caveats about it assuming a
replication factor of 3 and no real error handling, etc)...

for f in `hadoop fsck / | grep "Replica placement policy is violated"
| head -n80000 | awk -F: '{print $1}'`; do
    hadoop fs -setrep -w 4 $f
    hadoop fs -setrep 3 $f
done


On Tue, Mar 20, 2012 at 16:20, Patai Sangbutsarakum
<silvianhad...@gmail.com> wrote:
> Hadoopers!!
>
> I am going to restart hadoop cluster in order to enable rack-awareness
> first time.
> Currently we're running 0.20.203 with 500TB of data on 250+ nodes
> (without rack-awareness)
>
> I am thinking and afraid that when i start dfs (with rack-awareness
> enable) the HDFS will be in safemode for hours
> busy with relocating block to comply with rack-awareness.
>
> Anything knob i can dial to prevent that ?
>
> Thanks in advances
> Patai

Reply via email to