Re: how to improve the Hadoop's capability of dealing with small files

Jeff Hammerbacher Thu, 07 May 2009 00:41:44 -0700

Hey,

You can read more about why small files are difficult for HDFS at
http://www.cloudera.com/blog/2009/02/02/the-small-files-problem.


Regards,
Jeff

2009/5/7 Piotr Praczyk <piotr.prac...@gmail.com>

> If You want to use many small files, they are probably having the same
> purpose and struc?
> Why not use HBase instead of a raw HDFS ? Many small files would be packed
> together and the problem would disappear.
>
> cheers
> Piotr
>
> 2009/5/7 Jonathan Cao <jonath...@rockyou.com>
>
> > There are at least two design choices in Hadoop that have implications
> for
> > your scenario.
> > 1. All the HDFS meta data is stored in name node memory -- the memory
> size
> > is one limitation on how many "small" files you can have
> >
> > 2. The efficiency of map/reduce paradigm dictates that each
> mapper/reducer
> > job has enough work to offset the overhead of spawning the job.  It
> relies
> > on each task reading contiguous chuck of data (typically 64MB), your
> small
> > file situation will change those efficient sequential reads to larger
> > number
> > of inefficient random reads.
> >
> > Of course, small is a relative term?
> >
> > Jonathan
> >
> > 2009/5/6 陈桂芬 <chenguifen...@163.com>
> >
> > > Hi:
> > >
> > > In my application, there are many small files. But the hadoop is
> designed
> > > to deal with many large files.
> > >
> > > I want to know why hadoop doesn’t support small files very well and
> where
> > > is the bottleneck. And what can I do to improve the Hadoop’s capability
> > of
> > > dealing with small files.
> > >
> > > Thanks.
> > >
> > >
> >
>

Re: how to improve the Hadoop's capability of dealing with small files

Reply via email to