Re: Rebalancing Hadoop Cluster running 15.3

Usman Waheed Thu, 25 Jun 2009 03:15:55 -0700

Hi Tom,

Thanks for the trick :).

I tried by setting the replication to 3 in the hadoop-default.xml butthen the namenode-logfile in /var/log/hadoop started getting full withthe messages marked in bold:

2009-06-24 14:39:06,338 INFO org.apache.hadoop.dfs.StateChange: STATE*SafeModeInfo.leave: Safe mode is OFF.2009-06-24 14:39:06,339 INFO org.apache.hadoop.dfs.StateChange: STATE*Network topology has 1 racks and 3 datanodes2009-06-24 14:39:06,339 INFO org.apache.hadoop.dfs.StateChange: STATE*UnderReplicatedBlocks has 48545 blocks2009-06-24 14:39:07,655 INFO org.apache.hadoop.dfs.StateChange: BLOCK*NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicateblk_-4602580985572290582 to datanode(s) 10.20.11.44:500102009-06-24 14:39:07,655 INFO org.apache.hadoop.dfs.StateChange: BLOCK*NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicateblk_-4602036196619511999 to datanode(s) 10.20.11.44:500102009-06-24 14:39:07,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK*NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicateblk_-4601863051065326105 to datanode(s) 10.20.11.44:500102009-06-24 14:39:07,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK*NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicateblk_-4601770656364938220 to datanode(s) 10.20.11.44:500102009-06-24 14:39:10,829 INFO org.apache.hadoop.dfs.StateChange: BLOCK*NameSystem.addStoredBlock: blockMap updated: 10.20.11.44:50010 is addedto blk_-46017706563649382202009-06-24 14:39:10,832 INFO org.apache.hadoop.dfs.StateChange: BLOCK*NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicateblk_-4601706607039808418 to datanode(s) 10.20.11.44:500102009-06-24 14:39:10,833 INFO org.apache.hadoop.dfs.StateChange: BLOCK*NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicateblk_-4601652202073012439 to datanode(s) 10.20.11.44:500102009-06-24 14:39:10,834 INFO org.apache.hadoop.dfs.StateChange: BLOCK*NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicateblk_-4601470720696217621 to datanode(s) 10.20.11.44:500102009-06-24 14:39:10,834 INFO org.apache.hadoop.dfs.StateChange: BLOCK*NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicateblk_-4601267705629076611 to datanode(s) 10.20.11.44:50010*2009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Notable to place enough replicas, still in need of 12009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Not ableto place enough replicas, still in need of 12009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Not ableto place enough replicas, still in need of 12009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not ableto place enough replicas, still in need of 12009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not ableto place enough replicas, still in need of 12009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not ableto place enough replicas, still in need of 12009-06-24 14:39:13,901 WARN org.apache.hadoop.fs.FSNamesystem: Not ableto place enough replicas, still in need of 12009-06-24 14:39:13,901 WARN org.apache.hadoop.fs.FSNamesystem: Not ableto place enough replicas, still in need of 1*

It is a very small cluster with limited disk space. The disk was gettingfull because of all these extra messages there were being written to thelogfile. Eventually the file system would file up and hadoop hangs.This happened when i set the dfs.replication = 3 in thehadoop-default.xml and restarted the cluster.

Is there a way i can turn off these WARN messages which are filling upthe file system. I can run the command on the command line like youadvised with replication set to 3 and then once done, set it back to 2.

Currently the dfs.replication is set to 2 in the hadoop-default.xml.

Thanks,
Usman

Hi Usman,

Before the rebalancer was introduced one trick people used was to
increase the replication on all the files in the system, wait for
re-replication to complete, then decrease the replication to the
original level. You can do this using hadoop fs -setrep.

Cheers,
Tom

On Thu, Jun 25, 2009 at 10:33 AM, Usman Waheed<usm...@opera.com> wrote:

Hi,

One of our test clusters is running HADOOP 15.3 with replication level set
to 2. The datanodes are not balanced at all.

Datanode_1: 52%
Datanode_2: 82%
Datanode_3: 30%

15.3 does not have the rebalancer capability, we are planning to upgrade but
not for now.

If i take out Datanode_1 from the cluster (decommission for sometime) will
HADOOP balance so that Datanode_2 and Datanode_3 will even out to 56%?
Then i can re-introduce Datanode_1 back into the cluster.

Comments/Suggestions please?

Thanks,
Usman

Re: Rebalancing Hadoop Cluster running 15.3

Reply via email to