Aniruddha, We have not heard this request from users yet. It may be because our checkpointing has a purge, i.e. the small files are not left over. Small file problem has been there in Hadoop and relates to storing small files in Hadoop for a longer time (more likely forever).
Thks, Amol On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare < [email protected]> wrote: > Hi Community, > > Or Let me say BigFoots, do you think this feature should be available? > > The reason to bring this up was discussed in the start of this thread as: > > This is with the intention to recover the applications faster and do away > > with HDFS's small files problem as described here: > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ > > > > > http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-small-files-problem/ > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1 > > If we could save checkpoints in some other distributed file system (or > > even a HA NAS box) geared for small files, we could achieve - > > > > - Better performance of NN & HDFS for the production usage (read: > > production data I/O & not temp files) > > > > > > - Faster application recovery in case of planned shutdown / unplanned > > restarts > > > > If you feel the need of this feature, please cast your opinions and ideas > so that it can be converted in a jira. > > > > Thanks, > > > Aniruddha > > On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta <[email protected]> > wrote: > > > Aniruddha, > > > > Currently we don't have any support for that. > > > > Thanks > > Gaurav > > > > Thanks > > -Gaurav > > > > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi <[email protected]> > > wrote: > > > > > Default FSStorageAgent can be used as it can work with local > filesystem, > > > but I far as I know there is no support for specifying the directory > > > through xml file. by default it use the application directory on HDFS. > > > > > > Not sure If we could specify storage agent with its properties through > > the > > > configuration at dag level. > > > > > > - Tushar. > > > > > > > > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare < > > > [email protected]> wrote: > > > > > > > Hi, > > > > > > > > Do we have any storage agent which I can use readily, configurable > > > through > > > > dt-site.xml? > > > > > > > > I am looking for something which would save checkpoints in mounted > file > > > > system [eg. HA-NAS] which is basically just another directory for > Apex. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Aniruddha > > > > > > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde < > > [email protected]> > > > > wrote: > > > > > > > > > It is already supported refer the following jira for more > > information, > > > > > > > > > > https://issues.apache.org/jira/browse/APEXCORE-283 > > > > > > > > > > > > > > > > > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare < > > > > > [email protected]> wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > Is it possible to save checkpoints in any other highly available > > > > > > distributed file systems (which maybe mounted directories across > > the > > > > > > cluster) other than HDFS? > > > > > > If yes, is it configurable? > > > > > > > > > > > > AFAIK, there is no configurable option available to achieve that. > > > > > > If that's the case, can we have that feature? > > > > > > > > > > > > This is with the intention to recover the applications faster and > > do > > > > away > > > > > > with HDFS's small files problem as described here: > > > > > > > > > > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ > > > > > > > > > > > > > > > > > > > > > > > > > > > http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-small-files-problem/ > > > > > > > > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1 > > > > > > > > > > > > If we could save checkpoints in some other distributed file > system > > > (or > > > > > even > > > > > > a HA NAS box) geared for small files, we could achieve - > > > > > > > > > > > > - Better performance of NN & HDFS for the production usage > > (read: > > > > > > production data I/O & not temp files) > > > > > > - Faster application recovery in case of planned shutdown / > > > > unplanned > > > > > > restarts > > > > > > > > > > > > Please, send your comments, suggestions or ideas. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > Aniruddha > > > > > > > > > > > > > > > > > > > > >
