Brendan, since you are looking for a distr file system that can store multi
millions of files, try out MapR.  A few customers have actually crossed
over 1 trillion files without hitting problems.  Small files or large files
are handled equally well.

Of course, if you are doing map-reduce, it is better to process more data
per mapper (I'd say the sweet spot is between 64M - 256M of data), so it
might make sense to process many small files per mapper.

On Tue, May 22, 2012 at 2:39 AM, Brendan cheng <ccp...@hotmail.com> wrote:

>
> Hi,
> I read HDFS architecture doc and it said HDFS is tuned for at storing
> large file, typically gigabyte to terabytes.What is the downsize of storing
> million of small files like <10MB?  or what setting of HDFS is suitable for
> storing small files?
> Actually, I plan to find a distribute filed system for storing mult
> million of files.
> Brendan

Reply via email to