Re: Could only be replicated to 0 nodes, instead of 1

Brian Bockelman Thu, 21 May 2009 12:07:54 -0700


On May 21, 2009, at 2:01 PM, Raghu Angadi wrote:

I think you should file a jira on this. Most likely this is what ishappening :
* two out of 3 dns can not take anymore blocks.
* While picking nodes for a new block, NN mostly skips the third dnas well since '# active writes' on it is larger than '2 * avg'.* Even if there is one other block is being written on the 3rd, itis still greater than (2 * 1/3).
To test this, if you write just one block to an idle cluster itshould succeed.
Writing from the client on the 3rd dn succeeds since local node isalways favored.
This particular problem is not that severe on a large cluster butHDFS should do the sensible thing.


Hey Raghu,

If this analysis is right, I would add it can happen even on largeclusters! I've seen this error at our cluster when we're very full(>97%) and very few nodes have any empty space. This usually happensbecause we have two very large nodes (10x bigger than the rest of thecluster), and HDFS tends to distribute writes randomly -- meaning thesmaller nodes fill up quickly, until the balancer can catch up.


Brian

Raghu.

Stas Oskin wrote:
Hi.
I'm testing Hadoop in our lab, and started getting the followingmessage
when trying to copy a file:
Could only be replicated to 0 nodes, instead of 1
I have the following setup:
* 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
* Two clients are copying files all the time (one of them is the1.5GB
machine)
* The replication is set on 2
* I let the space on 2 smaller machines to end, to test the behavior
Now, one of the clients (the one located on 1.5GB) works fine, andthe otherone - the external, unable to copy and displays the error + theexception
below
Any idea if this expected on my scenario? Or how it can be solved?
Thanks in advance.
09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetExceptionsleeping
/test/test.bin retries left 1
09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/test/test.bin could only be replicated to 0 nodes, instead of 1
           at
org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123
)
at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)at sun.reflect.GeneratedMethodAccessor8.invoke(UnknownSource)
           at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
)
           at java.lang.reflect.Method.invoke(Method.java:597)
           at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
           at org.apache.hadoop.ipc.Client.call(Client.java:716)
           at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
           at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethod)
           at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)
           at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
)
           at java.lang.reflect.Method.invoke(Method.java:597)
           at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
)
           at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
)
           at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
           at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
)
           at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
)
           at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745
)
           at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922
)
09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block nullbad
datanode[0]
java.io.IOException: Could not get block locations. Aborting...
           at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153
)
           at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745
)
           at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899
)

Re: Could only be replicated to 0 nodes, instead of 1

Reply via email to