What you describe is pretty much my use case as well. Since I don’t know how 
big the number of files could get , I am trying to figure out if there is a 
theoretical design limitation in hdfs…..

From what I have read, the name node will store all metadata of all files in 
the RAM. Assuming (in my case), that a file is less than the configured block 
size….there should be a very rough formula that can be used to calculate the 
max number of files that hdfs can serve based on the configured RAM on the name 
node?

Can any of the implementers comment on this? Am I even thinking on the right 
track…?

Thanks Ian for the haystack link…very informative indeed.

-Chinmay



From: Stuart Smith [mailto:stu24m...@yahoo.com]
Sent: Wednesday, February 02, 2011 4:41 PM
To: hdfs-user@hadoop.apache.org
Subject: RE: HDFS without Hadoop: Why?

Hello,
   I'm actually using hbase/hadoop/hdfs for lots of small files (with a long 
tail of larger files). Well, millions of small files - I don't know what you 
mean by lots :)

Facebook probably knows better, But what I do is:

  - store metadata in hbase
  - files smaller than 10 MB or so in hbase
   -larger files in a hdfs directory tree.

I started storing 64 MB files and smaller in hbase (chunk size), but that 
causes issues with regionservers when running M/R jobs. This is related to the 
fact that I'm running a cobbled together cluster & my region servers don't have 
that much memory. I would play the size to see what works for you..

Take care,
   -stu

--- On Wed, 2/2/11, Dhodapkar, Chinmay <chinm...@qualcomm.com> wrote:

From: Dhodapkar, Chinmay <chinm...@qualcomm.com>
Subject: RE: HDFS without Hadoop: Why?
To: "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>
Date: Wednesday, February 2, 2011, 7:28 PM

Hello,



I have been following this thread for some time now. I am very comfortable with 
the advantages of hdfs, but still have lingering questions about the usage of 
hdfs for general purpose storage (no mapreduce/hbase etc).



Can somebody shed light on what the limitations are on the number of files that 
can be stored. Is it limited in anyway by the namenode? The use case I am 
interested in is to store a very large number of relatively small files (1MB to 
25MB).



Interestingly, I saw a facebook presentation on how they use hbase/hdfs 
internally. Them seem to store all metadata in hbase and the actual 
images/files/etc in something called “haystack” (why not use hdfs since they 
already have it?). Anybody know what “haystack” is?



Thanks!

Chinmay







From: Jeff Hammerbacher [mailto:ham...@cloudera.com]
Sent: Wednesday, February 02, 2011 3:31 PM
To: hdfs-user@hadoop.apache.org
Subject: Re: HDFS without Hadoop: Why?



  *   Large block size wastes space for small file.  The minimum file size is 1 
block.

That's incorrect. If a file is smaller than the block size, it will only 
consume as much space as there is data in the file.

  *   There are no hardlinks, softlinks, or quotas.

That's incorrect; there are quotas and softlinks.


Reply via email to