Hi all, I remember there is a parameter that we can turn this off. I mean we do not allow tasktracker to keep the blocks from other datanode after a MapReduce job finished.
I met a problem when I using hadoop-0.21.0. First of all, I balanced cluster according to number of blocks on every datanode. That's to say, for example, under "/user/test/", I have 100 blocks data. The replication number is 2. Then there are total 200 block under "/user/test". I have 10 datanodes. What I do is to let every datanode to have 20 blocks of the total. However, after about 300 MapReduce jobs finished. I found out the number of blocks in datanodes changed. It is not 20 for every datanode. someone got 21 and someone got 19. I turned off the hadoop balancer. What is the reason caused this problem? Any suggestion will be appreciated! Best wishes! Chen