There are at least two design choices in Hadoop that have implications for
your scenario.
1. All the HDFS meta data is stored in name node memory -- the memory size
is one limitation on how many "small" files you can have

2. The efficiency of map/reduce paradigm dictates that each mapper/reducer
job has enough work to offset the overhead of spawning the job.  It relies
on each task reading contiguous chuck of data (typically 64MB), your small
file situation will change those efficient sequential reads to larger number
of inefficient random reads.

Of course, small is a relative term?

Jonathan

2009/5/6 陈桂芬 <chenguifen...@163.com>

> Hi:
>
> In my application, there are many small files. But the hadoop is designed
> to deal with many large files.
>
> I want to know why hadoop doesn’t support small files very well and where
> is the bottleneck. And what can I do to improve the Hadoop’s capability of
> dealing with small files.
>
> Thanks.
>
>

Reply via email to