Thanks you guys advice! I have to mention more for my use case: (1) million files to store(2) 99% static, no change once written(3) fast download, or highly Available (4) cost effective (5) in future, would like extend a versioning system on the file of course from administrative point of view, most Hadoop function works for me. I checked a little bit of HBASE and I want to compare it with MongoDB as both also kind of key value. but MongoDB give me more functionalities that I don't need it at the moment. what do you think?
________________________________ > Date: Tue, 22 May 2012 21:56:31 -0700 > Subject: Re: Storing millions of small files > From: mcsri...@gmail.com > To: hdfs-user@hadoop.apache.org > > Brendan, since you are looking for a distr file system that can store > multi millions of files, try out MapR. A few customers have actually > crossed over 1 trillion files without hitting problems. Small files or > large files are handled equally well. > > Of course, if you are doing map-reduce, it is better to process more > data per mapper (I'd say the sweet spot is between 64M - 256M of data), > so it might make sense to process many small files per mapper. > > On Tue, May 22, 2012 at 2:39 AM, Brendan cheng > <ccp...@hotmail.com<mailto:ccp...@hotmail.com>> wrote: > > Hi, > I read HDFS architecture doc and it said HDFS is tuned for at storing > large file, typically gigabyte to terabytes.What is the downsize of > storing million of small files like <10MB? or what setting of HDFS is > suitable for storing small files? > Actually, I plan to find a distribute filed system for storing mult > million of files. > Brendan >