Koji Noguchi wrote:
If restarting the entire dfs helped, then you might be hitting
http://issues.apache.org/jira/browse/HADOOP-3633
When we were running 0.17.1, I had to grep for OutOfMemory on the
datanode ".out" files at least everyday and restart those zombie
datanodes.
Once datanode gets to this state, as Konstantin mentioned in the Jira,
" it appears to happily sending heartbeats, but in fact cannot
do any data processing because the server thread is dead."
Koji
My branch of the hadoop codebase includes the #of live nodes in the
cluster when encountering problems replicating, so identifying the
obvious problem (though not the root cause for there being 0 live data
nodes)
WARN hdfs.DFSClient : DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/test-filename could only be replicated to 0 nodes, instead of 1. (
there are 0 live data nodes in the cluster)
I could supply a patch for this if people want it in the released
product. I think its useful :)