Aniruddha,
We have not heard this request from users yet. It may be because our
checkpointing has a purge, i.e. the small files are not left over. Small
file problem has been there in Hadoop and relates to storing small files in
Hadoop for a longer time (more likely forever).

Thks,
Amol


On Mon, Feb 1, 2016 at 6:05 AM, Aniruddha Thombare <
[email protected]> wrote:

> Hi Community,
>
> Or Let me say BigFoots, do you think this feature should be available?
>
> The reason to bring this up was discussed in the start of this thread as:
>
> This is with the intention to recover the applications faster and do away
> > with HDFS's small files problem as described here:
> > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> >
> >
> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-small-files-problem/
> > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > If we could save checkpoints in some other distributed file system (or
> > even a HA NAS box) geared for small files, we could achieve -
> >
> >    - Better performance of NN & HDFS for the production usage (read:
> >    production data I/O & not temp files)
> >
> >
> >    - Faster application recovery in case of planned shutdown / unplanned
> >    restarts
> >
> > If you feel the need of this feature, please cast your opinions and ideas
> so that it can be converted in a jira.
>
>
>
> Thanks,
>
>
> Aniruddha
>
> On Thu, Jan 21, 2016 at 11:19 PM, Gaurav Gupta <[email protected]>
> wrote:
>
> > Aniruddha,
> >
> > Currently we don't have any support for that.
> >
> > Thanks
> > Gaurav
> >
> > Thanks
> > -Gaurav
> >
> > On Thu, Jan 21, 2016 at 12:24 AM, Tushar Gosavi <[email protected]>
> > wrote:
> >
> > > Default FSStorageAgent can be used as it can work with local
> filesystem,
> > > but I far as I know there is no support for specifying the directory
> > > through xml file. by default it use the application directory on HDFS.
> > >
> > > Not sure If we could specify storage agent with its properties through
> > the
> > > configuration at dag level.
> > >
> > > - Tushar.
> > >
> > >
> > > On Thu, Jan 21, 2016 at 12:14 PM, Aniruddha Thombare <
> > > [email protected]> wrote:
> > >
> > > > Hi,
> > > >
> > > > Do we have any storage agent which I can use readily, configurable
> > > through
> > > > dt-site.xml?
> > > >
> > > > I am looking for something which would save checkpoints in mounted
> file
> > > > system [eg. HA-NAS] which is basically just another directory for
> Apex.
> > > >
> > > >
> > > >
> > > >
> > > > Thanks,
> > > >
> > > >
> > > > Aniruddha
> > > >
> > > > On Wed, Jan 20, 2016 at 8:33 PM, Sandesh Hegde <
> > [email protected]>
> > > > wrote:
> > > >
> > > > > It is already supported refer the following jira for more
> > information,
> > > > >
> > > > > https://issues.apache.org/jira/browse/APEXCORE-283
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jan 19, 2016 at 10:43 PM Aniruddha Thombare <
> > > > > [email protected]> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Is it possible to save checkpoints in any other highly available
> > > > > > distributed file systems (which maybe mounted directories across
> > the
> > > > > > cluster) other than HDFS?
> > > > > > If yes, is it configurable?
> > > > > >
> > > > > > AFAIK, there is no configurable option available to achieve that.
> > > > > > If that's the case, can we have that feature?
> > > > > >
> > > > > > This is with the intention to recover the applications faster and
> > do
> > > > away
> > > > > > with HDFS's small files problem as described here:
> > > > > >
> > > > > > http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://snowplowanalytics.com/blog/2013/05/30/dealing-with-hadoops-small-files-problem/
> > > > > >
> > http://inquidia.com/news-and-info/working-small-files-hadoop-part-1
> > > > > >
> > > > > > If we could save checkpoints in some other distributed file
> system
> > > (or
> > > > > even
> > > > > > a HA NAS box) geared for small files, we could achieve -
> > > > > >
> > > > > >    - Better performance of NN & HDFS for the production usage
> > (read:
> > > > > >    production data I/O & not temp files)
> > > > > >    - Faster application recovery in case of planned shutdown /
> > > > unplanned
> > > > > >    restarts
> > > > > >
> > > > > > Please, send your comments, suggestions or ideas.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > >
> > > > > > Aniruddha
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to