Are you trying to serve blocks from a shared directory e.g. NFS?

The storageID for a node is recorded in a file named "VERSION" in
${dfs.data.dir}/current. If one node claims that the storage directory is
already locked, and another node is reporting the first node's storageID, it
makes me think that you have multiple datanodes attempting to use the same
shared directory for storage.

This can't be done in Hadoop. Each datanode assumes it has sole access to a
storage directory. Besides, it defeats the point of using multiple datanodes
to distribute disk and network I/O :) If you're using NFS, reconfigure all
the datanodes to store blocks in local directories only (by moving
dfs.data.dir) and try again.

The mention of the namesecondary directory in there also makes me think that
you're trying to start redundant copies of the secondarynamenode (e.g., by
listing the same node twice in the 'conf/masters' file) or that the
fs.checkpoint.dir is the same as the dfs.data.dir. This also isn't allowed
-- dfs.data.dir, dfs.name.dir, and fs.checkpoint.dir must all refer to
distinct physical locations.

- Aaron

On Fri, Aug 21, 2009 at 7:19 AM, Sujith Vellat <vtsuj...@gmail.com> wrote:

>
>
> Sent from my iPhone
>
>
> On Aug 21, 2009, at 9:25 AM, Jason Venner <jason.had...@gmail.com> wrote:
>
>  It may be that the individual datanodes get different names for their ip
>> addresses than the namenode does.
>> It may also be that some subset of your namenode/datanodes do not have
>> write
>> access to the hdfs storage directories.
>>
>>
>> On Mon, Aug 17, 2009 at 10:05 PM, qiu tian <tianqiu_...@yahoo.com.cn>
>> wrote:
>>
>>  Hi everyone.
>>> I installed hadoop among three pcs. When I ran the command
>>> 'start-all.sh',
>>> I only could start the jobtracker and tasktrackers. I use 192.*.*.x as
>>> master and use 192.*.*.y and 192.*.*.z as slaves.
>>>
>>> The namenode log from the master 192.*.*.x is following like this:
>>>
>>> 2009-08-18 10:48:44,543 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
>>> NameSystem.registerDatanode: node 192.*.*.y:50010 is replaced by
>>> 192.*.*.x:50010 with the same storageID
>>> DS-1120429845-127.0.0.1-50010-1246697164684
>>> 2009-08-18 10:48:44,543 INFO org.apache.hadoop.net.NetworkTopology:
>>> Removing a node: /default-rack/192.*.*.y:50010
>>> 2009-08-18 10:48:44,543 INFO org.apache.hadoop.net.NetworkTopology:
>>> Adding
>>> a new node: /default-rack/192.*.*.x:50010
>>> 2009-08-18 10:48:45,932 FATAL org.apache.hadoop.hdfs.StateChange: BLOCK*
>>> NameSystem.getDatanode: Data node 192.*.*.z:50010 is attempting to report
>>> storage ID DS-1120429845-127.0.0.1-50010-1246697164684. Node
>>> 192.*.*.x:50010
>>> is expected to serve this storage.
>>> 2009-08-18 10:48:45,932 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 8 on 9000, call blockReport(DatanodeRegistration(192.*.*.z:50010,
>>> storageID=DS-1120429845-127.0.0.1-50010-1246697164684, infoPort=50075,
>>> ipcPort=50020), [...@1b8ebe3) from 192.*.*.z:33177: error:
>>> org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node
>>> 192.*.*.z:50010 is attempting to report storage ID
>>> DS-1120429845-127.0.0.1-50010-1246697164684. Node 192.*.*.x:50010 is
>>> expected to serve this storage.
>>> org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node
>>> 192.*.*.z:50010 is attempting to report storage ID
>>> DS-1120429845-127.0.0.1-50010-1246697164684. Node 192.*.*.x:50010 is
>>> expected to serve this storage.
>>>       at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDatanode(FSNamesystem.java:3800)
>>>       at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processReport(FSNamesystem.java:2771)
>>>       at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.NameNode.blockReport(NameNode.java:636)
>>>       at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>>>       at
>>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
>>>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)
>>> 2009-08-18 10:48:46,398 FATAL org.apache.hadoop.hdfs.StateChange: BLOCK*
>>> NameSystem.getDatanode: Data node 192.*.*.y:50010 is attempting to report
>>> storage ID DS-1120429845-127.0.0.1-50010-1246697164684. Node
>>> 192.*.*.x:50010
>>> is expected to serve this storage.
>>> 2009-08-18 10:48:46,398 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 0 on 9000, call
>>> blockReport(DatanodeRegistration(192.9.200.y:50010,
>>> storageID=DS-1120429845-127.0.0.1-50010-1246697164684, infoPort=50075,
>>> ipcPort=50020), [...@186b634) from 192.*.*.y:47367: error:
>>> org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node
>>> 192.*.*.y:50010 is attempting to report storage ID
>>> DS-1120429845-127.0.0.1-50010-1246697164684. Node 192.*.*.x:50010 is
>>> expected to serve this storage.
>>> org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node
>>> 192.*.*.y:50010 is attempting to report storage ID
>>> DS-1120429845-127.0.0.1-50010-1246697164684. Node 192.*.*.x:50010 is
>>> expected to serve this storage.
>>>       at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDatanode(FSNamesystem.java:3800)
>>>       at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processReport(FSNamesystem.java:2771)
>>>       at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.NameNode.blockReport(NameNode.java:636)
>>>       at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>>>       at
>>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
>>>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)
>>> 2009-08-18 10:48:47,000 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
>>> 192.*.*.x
>>> ~
>>>
>>> The message on the shell looks like this:
>>> 192.*.*.x: Exception in thread "main" java.io.IOException: Cannot lock
>>> storage /home/gaojun/HadoopInstall/tmp/dfs/namesecondary. The directory
>>> is
>>> already locked.
>>> 192.*.*.x:     at
>>>
>>> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:510)
>>> 192.*.*.x:     at
>>>
>>> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
>>> 192.*.*.x:     at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.recoverCreate(SecondaryNameNode.java:517)
>>> 192.*.*.x:     at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:145)
>>> 192.*.*.x:     at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(SecondaryNameNode.java:115)
>>> 192.*.*.x:     at
>>>
>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:469)
>>>
>>> I could not find the reason. Can someone help me?
>>> Thanks!
>>>
>>> yan
>>>
>>>
>>>
>>>    ___________________________________________________________
>>> 好玩贺卡等你发,邮箱贺卡全新上线!
>>> http://card.mail.cn.yahoo.com/
>>>
>>>
>>
>>
>> --
>> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
>> http://www.amazon.com/dp/1430219424?tag=jewlerymall
>> www.prohadoopbook.com a community for Hadoop Professionals
>>
>

Reply via email to