Re: HDFS without Hadoop: Why?

Gaurav Sharma Wed, 02 Feb 2011 18:31:58 -0800

Stuart - if Dhruba is giving hdfs file and block sizes used by the namenode,
you really cannot get a more authoritative number elsewhere :) I would do
the back-of-envelope with ~160 bytes/file and ~150 bytes/block.


On Wed, Feb 2, 2011 at 9:08 PM, Stuart Smith <stu24m...@yahoo.com> wrote:

>
> This is the best coverage I've seen from a source that would know:
>
>
> http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/
>
> One relevant quote:
>
> To store 100 million files (referencing 200 million blocks), a name-node
> should have at least 60 GB of RAM.
>
> But, honestly, if you're just building out your cluster, you'll probably
> run into a lot of other limits first: hard drive space, regionserver memory,
> the infamous ulimit/xciever :), etc...the
>
> Take care,
>   -stu
>
> --- On *Wed, 2/2/11, Dhruba Borthakur <dhr...@gmail.com>* wrote:
>
>
> From: Dhruba Borthakur <dhr...@gmail.com>
>
> Subject: Re: HDFS without Hadoop: Why?
> To: hdfs-user@hadoop.apache.org
> Date: Wednesday, February 2, 2011, 9:00 PM
>
>
> The Namenode uses around 160 bytes/file and 150 bytes/block in HDFS. This
> is a very rough calculation.
>
> dhruba
>
> On Wed, Feb 2, 2011 at 5:11 PM, Dhodapkar, Chinmay 
> <chinm...@qualcomm.com<http://mc/compose?to=chinm...@qualcomm.com>
> > wrote:
>
>  What you describe is pretty much my use case as well. Since I don’t know
> how big the number of files could get , I am trying to figure out if there
> is a theoretical design limitation in hdfs…..
>
>
>
> From what I have read, the name node will store all metadata of all files
> in the RAM. Assuming (in my case), that a file is less than the configured
> block size….there should be a very rough formula that can be used to
> calculate the max number of files that hdfs can serve based on the
> configured RAM on the name node?
>
>
>
> Can any of the implementers comment on this? Am I even thinking on the
> right track…?
>
>
>
> Thanks Ian for the haystack link…very informative indeed.
>
>
>
> -Chinmay
>
>
>
>
>
>
>
> *From:* Stuart Smith 
> [mailto:stu24m...@yahoo.com<http://mc/compose?to=stu24m...@yahoo.com>]
>
> *Sent:* Wednesday, February 02, 2011 4:41 PM
>
> *To:* 
> hdfs-user@hadoop.apache.org<http://mc/compose?to=hdfs-user@hadoop.apache.org>
> *Subject:* RE: HDFS without Hadoop: Why?
>
>
>
> Hello,
>    I'm actually using hbase/hadoop/hdfs for lots of small files (with a
> long tail of larger files). Well, millions of small files - I don't know
> what you mean by lots :)
>
> Facebook probably knows better, But what I do is:
>
>   - store metadata in hbase
>   - files smaller than 10 MB or so in hbase
>    -larger files in a hdfs directory tree.
>
> I started storing 64 MB files and smaller in hbase (chunk size), but that
> causes issues with regionservers when running M/R jobs. This is related to
> the fact that I'm running a cobbled together cluster & my region servers
> don't have that much memory. I would play the size to see what works for
> you..
>
> Take care,
>    -stu
>
> --- On *Wed, 2/2/11, Dhodapkar, Chinmay 
> <chinm...@qualcomm.com<http://mc/compose?to=chinm...@qualcomm.com>
> >* wrote:
>
>
> From: Dhodapkar, Chinmay 
> <chinm...@qualcomm.com<http://mc/compose?to=chinm...@qualcomm.com>
> >
> Subject: RE: HDFS without Hadoop: Why?
> To: 
> "hdfs-user@hadoop.apache.org<http://mc/compose?to=hdfs-user@hadoop.apache.org>"
> <hdfs-user@hadoop.apache.org<http://mc/compose?to=hdfs-user@hadoop.apache.org>
> >
> Date: Wednesday, February 2, 2011, 7:28 PM
>
> Hello,
>
>
>
> I have been following this thread for some time now. I am very comfortable
> with the advantages of hdfs, but still have lingering questions about the
> usage of hdfs for general purpose storage (no mapreduce/hbase etc).
>
>
>
> Can somebody shed light on what the limitations are on the number of files
> that can be stored. Is it limited in anyway by the namenode? The use case I
> am interested in is to store a very large number of relatively small files
> (1MB to 25MB).
>
>
>
> Interestingly, I saw a facebook presentation on how they use hbase/hdfs
> internally. Them seem to store all metadata in hbase and the actual
> images/files/etc in something called “haystack” (why not use hdfs since they
> already have it?). Anybody know what “haystack” is?
>
>
>
> Thanks!
>
> Chinmay
>
>
>
>
>
>
>
> *From:* Jeff Hammerbacher 
> [mailto:ham...@cloudera.com<http://mc/compose?to=ham...@cloudera.com>]
>
> *Sent:* Wednesday, February 02, 2011 3:31 PM
> *To:* 
> hdfs-user@hadoop.apache.org<http://mc/compose?to=hdfs-user@hadoop.apache.org>
> *Subject:* Re: HDFS without Hadoop: Why?
>
>
>
>
>    - Large block size wastes space for small file.  The minimum file size
>    is 1 block.
>
>   That's incorrect. If a file is smaller than the block size, it will only
> consume as much space as there is data in the file.
>
>
>    - There are no hardlinks, softlinks, or quotas.
>
>   That's incorrect; there are quotas and softlinks.
>
>
>
>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>
>
>

Re: HDFS without Hadoop: Why?

Reply via email to