If I understand correctly, datanode reports its blocks based on the contents of dfs.data.dir.
When you cloned the data node, you cloned all of its blocks as well. When you add a "fresh" datanode to the cluster, you add one that has an empty dfs.data.dir. Try clearing out dfs.data.dir before adding the new node. Jeff On Wed, May 11, 2011 at 1:59 PM, Steve Cohen <mail4st...@gmail.com> wrote: > Hello, > > We are running an hdfs cluster and we decided we wanted to add a new > datanode. Since we are using a virtual machine, we just cloned an existing > datanode. We added it to the slaves list and started up the cluster. We > started getting log messages like this in the namenode log: > > 2011-05-11 15:59:44,148 ERROR hdfs.StateChange - BLOCK* > NameSystem.getDatanode: Data node 10.104.211.58:50010 is attempting to > report storage ID DS-1360904153-10.104.211.57-50010-1293288346692. Node > 10.104.211.57:50010 is expected to serve this storage. > 2011-05-11 15:59:46,975 ERROR hdfs.StateChange - BLOCK* > NameSystem.getDatanode: Data node 10.104.211.57:50010 is attempting to > report storage ID DS-1360904153-10.104.211.57-50010-1293288346692. Node > 10.104.211.58:50010 is expected to serve this storage. > > I understand that this is because the datanodes have the exact same > information so the first data node that connects has precedence. > > Is it possible to just wipe one of the datanodes so it is blank or do we > have to format the entire hdfs filesystem from the namenode to add the new > datanode. > > Thanks, > Steve Cohen >