Brendan, since you are looking for a distr file system that can store multi millions of files, try out MapR. A few customers have actually crossed over 1 trillion files without hitting problems. Small files or large files are handled equally well.
Of course, if you are doing map-reduce, it is better to process more data per mapper (I'd say the sweet spot is between 64M - 256M of data), so it might make sense to process many small files per mapper. On Tue, May 22, 2012 at 2:39 AM, Brendan cheng <ccp...@hotmail.com> wrote: > > Hi, > I read HDFS architecture doc and it said HDFS is tuned for at storing > large file, typically gigabyte to terabytes.What is the downsize of storing > million of small files like <10MB? or what setting of HDFS is suitable for > storing small files? > Actually, I plan to find a distribute filed system for storing mult > million of files. > Brendan