Hi, Is it possible to save checkpoints in any other highly available distributed file systems (which maybe mounted directories across the cluster) other than HDFS? If yes, is it configurable?
AFAIK, there is no configurable option available to achieve that. If that's the case, can we have that feature? This is with the intention to recover the applications faster and do away with HDFS's small files problem as described here: http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-small-files-problem/ http://inquidia.com/news-and-info/working-small-files-hadoop-part-1 If we could save checkpoints in some other distributed file system (or even a HA NAS box) geared for small files, we could achieve - - Better performance of NN & HDFS for the production usage (read: production data I/O & not temp files) - Faster application recovery in case of planned shutdown / unplanned restarts Please, send your comments, suggestions or ideas. Thanks, Aniruddha
