Hi Community, Or Let me say BigFoots, do you think this feature should be available?
The reason to bring this up was discussed in the start of this thread as: This is with the intention to recover the applications faster and do away > with HDFS's small files problem as described here: > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ > > http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-small-files-problem/ > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1 > If we could save checkpoints in some other distributed file system (or > even a HA NAS box) geared for small files, we could achieve - > > - Better performance of NN & HDFS for the production usage (read: > production data I/O & not temp files) > > > - Faster application recovery in case of planned shutdown / unplanned > restarts > > If you feel the need of this feature, please cast your opinions and ideas so that it can be converted in a jira. Thanks, Aniruddha On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta <[email protected]> wrote: > Aniruddha, > > Currently we don't have any support for that. > > Thanks > Gaurav > > Thanks > -Gaurav > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi <[email protected]> > wrote: > > > Default FSStorageAgent can be used as it can work with local filesystem, > > but I far as I know there is no support for specifying the directory > > through xml file. by default it use the application directory on HDFS. > > > > Not sure If we could specify storage agent with its properties through > the > > configuration at dag level. > > > > - Tushar. > > > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare < > > [email protected]> wrote: > > > > > Hi, > > > > > > Do we have any storage agent which I can use readily, configurable > > through > > > dt-site.xml? > > > > > > I am looking for something which would save checkpoints in mounted file > > > system [eg. HA-NAS] which is basically just another directory for Apex. > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Aniruddha > > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde < > [email protected]> > > > wrote: > > > > > > > It is already supported refer the following jira for more > information, > > > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283 > > > > > > > > > > > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare < > > > > [email protected]> wrote: > > > > > > > > > Hi, > > > > > > > > > > Is it possible to save checkpoints in any other highly available > > > > > distributed file systems (which maybe mounted directories across > the > > > > > cluster) other than HDFS? > > > > > If yes, is it configurable? > > > > > > > > > > AFAIK, there is no configurable option available to achieve that. > > > > > If that's the case, can we have that feature? > > > > > > > > > > This is with the intention to recover the applications faster and > do > > > away > > > > > with HDFS's small files problem as described here: > > > > > > > > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ > > > > > > > > > > > > > > > > > > > > http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-small-files-problem/ > > > > > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1 > > > > > > > > > > If we could save checkpoints in some other distributed file system > > (or > > > > even > > > > > a HA NAS box) geared for small files, we could achieve - > > > > > > > > > > - Better performance of NN & HDFS for the production usage > (read: > > > > > production data I/O & not temp files) > > > > > - Faster application recovery in case of planned shutdown / > > > unplanned > > > > > restarts > > > > > > > > > > Please, send your comments, suggestions or ideas. > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > Aniruddha > > > > > > > > > > > > > > >
