Hi, Our production cluster started reporting "too many open files" this afternoon and subsequently was unable to save any snapshots to disk. We have been able to recover it ok, but I would have expected the NN to complain more if it cannot save a snapshot. All I saw in the log was...
"WARN org.apache.hadoop.hdfs.server.common.Storage: rollEdidLog: removing storage <local dir>" "WARN org.apache.hadoop.hdfs.server.common.Storage: rollEdidLog: removing storage <nfs dir>" Do you think this should trigger the NN to enter safe mode. The longer this goes un-noticed, the more data could be lost if the NN cannot be recovered? Regards, James.