Koji Noguchi wrote:
If restarting the entire dfs helped, then you might be hitting http://issues.apache.org/jira/browse/HADOOP-3633

When we were running 0.17.1, I had to grep for OutOfMemory on the
datanode ".out" files at least everyday and restart those zombie
datanodes.

Once datanode gets to this state, as Konstantin mentioned in the Jira,
" it appears to happily sending heartbeats, but in fact cannot
do any data processing because the server thread is dead."

Koji


My branch of the hadoop codebase includes the #of live nodes in the cluster when encountering problems replicating, so identifying the obvious problem (though not the root cause for there being 0 live data nodes)

WARN hdfs.DFSClient : DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /test-filename could only be replicated to 0 nodes, instead of 1. ( there are 0 live data nodes in the cluster)

I could supply a patch for this if people want it in the released product. I think its useful :)



Reply via email to