Hey, You can read more about why small files are difficult for HDFS at http://www.cloudera.com/blog/2009/02/02/the-small-files-problem.
Regards, Jeff 2009/5/7 Piotr Praczyk <piotr.prac...@gmail.com> > If You want to use many small files, they are probably having the same > purpose and struc? > Why not use HBase instead of a raw HDFS ? Many small files would be packed > together and the problem would disappear. > > cheers > Piotr > > 2009/5/7 Jonathan Cao <jonath...@rockyou.com> > > > There are at least two design choices in Hadoop that have implications > for > > your scenario. > > 1. All the HDFS meta data is stored in name node memory -- the memory > size > > is one limitation on how many "small" files you can have > > > > 2. The efficiency of map/reduce paradigm dictates that each > mapper/reducer > > job has enough work to offset the overhead of spawning the job. It > relies > > on each task reading contiguous chuck of data (typically 64MB), your > small > > file situation will change those efficient sequential reads to larger > > number > > of inefficient random reads. > > > > Of course, small is a relative term? > > > > Jonathan > > > > 2009/5/6 陈桂芬 <chenguifen...@163.com> > > > > > Hi: > > > > > > In my application, there are many small files. But the hadoop is > designed > > > to deal with many large files. > > > > > > I want to know why hadoop doesn’t support small files very well and > where > > > is the bottleneck. And what can I do to improve the Hadoop’s capability > > of > > > dealing with small files. > > > > > > Thanks. > > > > > > > > >