Hi!

I'm facing the problem where datanodes are marked as down due to them being to slow in doing blockreports, which in turn is due to too many blocks per node. I.e. https://issues.apache.org/jira/browse/HADOOP-4584, but I can't easily upgrade to 0.21.

So I came up with a possible workaround - run multiple datanode instances on each physical node, each handling a subset of the disks on that node. Not sure it will work, but could be worth a try.

So I configured a second datanode on one of my nodes, configured to run on a different set of ports, and configured the two datanode instances to use half of the disks each.

However, when starting up this configuration, I get the below exception (UnregisteredDatanodeException) in the namenode log, and the datanode then shuts down after reporting the same.

How can I work around this?

Removing VERSION file in data dir does not help, the data file just exits with an exception about the data dir being in an inconsistent state.

Can I simply edit the VERSION file in the data dir's that are on the new instance, replacing e.g. the port number that's there with the new, correct, port number? Or will that confuse datanode or namenode?

Or should I start the datanode with an empty data dir, let it register with the namenode, immediately shut it down, then use the VERSION file from the empty datadir as new VERSION file for all the data dirs that already contain data?

I'm guessing what I'm trying to do would be equivalent to moving disks from one host to another, something I can imaginen would happen in some system administrative situations. So what would be the procedure for that?

Any help would be appreciated.

Thanks,
\EF

Full exception in namenode log:


2011-12-07 09:45:16,699 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 9000, call blockReceived(DatanodeRegistration(10.20.40.14:50011, storageID=DS-71308762-1 0.20.11.66-50010-1269957604444, infoPort=50081, ipcPort=50021), [Lorg.apache.hadoop.hdfs.protocol.Block;@3aa57508, [Ljava.lang.String;@44a67e4c) from 10.20.40.14:58464: er ror: org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node 10.20.40.14:50011 is attempting to report storage ID DS-71308762-10.20.11.66-50010-1269957604
444. Node 10.20.40.14:50010 is expected to serve this storage.
org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node 10.20.40.14:50011 is attempting to report storage ID DS-71308762-10.20.11.66-50010-1269957604444.
Node 10.20.40.14:50010 is expected to serve this storage.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDatanode(FSNamesystem.java:3972) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.blockReceived(FSNamesystem.java:3388) at org.apache.hadoop.hdfs.server.namenode.NameNode.blockReceived(NameNode.java:776)
        at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)

Reply via email to