Nathan, Great references. There is a good place to put them to: http://wiki.apache.org/hadoop/HDFS_Publications GPFS and Lustre papers are not there yet, I believe.
Thanks, --Konstantin On Thu, Feb 3, 2011 at 10:48 AM, Nathan Rutman <nrut...@gmail.com> wrote: > > On Feb 2, 2011, at 6:42 PM, Konstantin Shvachko wrote: > > Thanks for the link Stu. > More details are on limitations are here: > http://www.usenix.org/publications/login/2010-04/openpdfs/shvachko.pdf > > I think that Nathan raised an interesting question and his assessment of > HDFS use > cases are generally right. > Some assumptions though are outdated at this point. > And people mentioned about it in the thread. > We have append implementation, which allows reopening files for updates. > We also have symbolic links and quotas (space and name-space). > The api to HDFS is not posix, true. But in addition to Fuse people also use > > Thrift to access hdfs. > Most of these features are explained in HDFS overview paper: > http://storageconference.org/2010/Papers/MSST/Shvachko.pdf > > Stand-alone HDFS is actually used in several places. I like what > Brian Bockelman at University of Nebraska does. > They store CERN data in their cluster, and physicists use Fortran to access > the data, > not map-reduce, as I heard. > http://storageconference.org/2010/Presentations/MSST/3.Bockelman.pdf > > This doesn't seem to mention what storage they're using. > > > With respect to other distributed file systems. HDFS performance was > compared to > PVFS, GPFS and Lustre. The results were in favor of HDFS. See e.g. > > PVFS > > http://www.cs.cmu.edu/~wtantisi/files/hadooppvfs-pdl08.pdf<http://www.cs.cmu.edu/%7Ewtantisi/files/hadooppvfs-pdl08.pdf> > > > Some other references for those interested: HDFS vs > GPFS > Cloud analytics: Do we really need to reinvent the storage > stack?<http://www.usenix.org/event/hotcloud09/tech/full_papers/ananthanarayanan.pdf> > Lustre > http://wiki.lustre.org/images/1/1b/Hadoop_wp_v0.4.2.pdf > Ceph > www.usenix.org—maltzahn.pdf<http://www.usenix.org/publications/login/2010-08/openpdfs/maltzahn.pdf> > > These GPFS and Lustre papers were both favorable toward HDFS because > they missed a fundamental issue: for the former FS's, network speed is > critical. > HDFS doesn't need network on reads (ideally), and so is simultaneously > immune to network > speed, but also cannot take advantage of network speed. For slow networks > (1GigE) > this plays into HDFS's strength, but for fast networks (10GigE, > Infiniband), > the balance tips the other way. (My testing: for a heavily loaded network, > a 3-4x read > speed factor for Lustre. For writes, the difference is even more extreme > (10x), > since HDFS has to hop all write data over the network twice.) > > Let me say clearly that your choice of FS should depend on which of many > factors > are most important to you -- there is no "one size fits all", although that > sadly makes our > decisions more complex. For those using Hadoop that have a high weighting > on > IO performance (as well as some other factors I listed in my original > mail), I suggest you > at least think about spending money on a fast network and using a FS that > can utilize it. > > > So I agree with Nathan HDFS was designed and optimized as a storage layer > for > map-reduce type tasks, but it performs well as a general purpose fs as > well. > > Thanks, > --Konstantin > > > > > On Wed, Feb 2, 2011 at 6:08 PM, Stuart Smith <stu24m...@yahoo.com> wrote: > >> >> This is the best coverage I've seen from a source that would know: >> >> >> http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/ >> >> One relevant quote: >> >> To store 100 million files (referencing 200 million blocks), a name-node >> should have at least 60 GB of RAM. >> >> But, honestly, if you're just building out your cluster, you'll probably >> run into a lot of other limits first: hard drive space, regionserver memory, >> the infamous ulimit/xciever :), etc... >> >> Take care, >> -stu >> >> --- On *Wed, 2/2/11, Dhruba Borthakur <dhr...@gmail.com>* wrote: >> >> >> From: Dhruba Borthakur <dhr...@gmail.com> >> Subject: Re: HDFS without Hadoop: Why? >> To: hdfs-user@hadoop.apache.org >> Date: Wednesday, February 2, 2011, 9:00 PM >> >> The Namenode uses around 160 bytes/file and 150 bytes/block in HDFS. This >> is a very rough calculation. >> >> dhruba >> >> On Wed, Feb 2, 2011 at 5:11 PM, Dhodapkar, Chinmay >> <chinm...@qualcomm.com<http://mc/compose?to=chinm...@qualcomm.com> >> > wrote: >> >> What you describe is pretty much my use case as well. Since I don’t know >> how big the number of files could get , I am trying to figure out if there >> is a theoretical design limitation in hdfs….. >> >> >> From what I have read, the name node will store all metadata of all files >> in the RAM. Assuming (in my case), that a file is less than the configured >> block size….there should be a very rough formula that can be used to >> calculate the max number of files that hdfs can serve based on the >> configured RAM on the name node? >> >> >> Can any of the implementers comment on this? Am I even thinking on the >> right track…? >> >> >> Thanks Ian for the haystack link…very informative indeed. >> >> >> -Chinmay >> >> >> >> >> *From:* Stuart Smith >> [mailto:stu24m...@yahoo.com<http://mc/compose?to=stu24m...@yahoo.com>] >> >> *Sent:* Wednesday, February 02, 2011 4:41 PM >> >> *To:* >> hdfs-user@hadoop.apache.org<http://mc/compose?to=hdfs-user@hadoop.apache.org> >> *Subject:* RE: HDFS without Hadoop: Why? >> >> >> >> Hello, >> I'm actually using hbase/hadoop/hdfs for lots of small files (with a >> long tail of larger files). Well, millions of small files - I don't know >> what you mean by lots :) >> >> Facebook probably knows better, But what I do is: >> >> - store metadata in hbase >> - files smaller than 10 MB or so in hbase >> -larger files in a hdfs directory tree. >> >> I started storing 64 MB files and smaller in hbase (chunk size), but that >> causes issues with regionservers when running M/R jobs. This is related to >> the fact that I'm running a cobbled together cluster & my region servers >> don't have that much memory. I would play the size to see what works for >> you.. >> >> Take care, >> -stu >> >> --- On *Wed, 2/2/11, Dhodapkar, Chinmay >> <chinm...@qualcomm.com<http://mc/compose?to=chinm...@qualcomm.com> >> >* wrote: >> >> >> From: Dhodapkar, Chinmay >> <chinm...@qualcomm.com<http://mc/compose?to=chinm...@qualcomm.com> >> > >> Subject: RE: HDFS without Hadoop: Why? >> To: >> "hdfs-user@hadoop.apache.org<http://mc/compose?to=hdfs-user@hadoop.apache.org>" >> <hdfs-user@hadoop.apache.org<http://mc/compose?to=hdfs-user@hadoop.apache.org> >> > >> Date: Wednesday, February 2, 2011, 7:28 PM >> >> Hello, >> >> >> I have been following this thread for some time now. I am very comfortable >> with the advantages of hdfs, but still have lingering questions about the >> usage of hdfs for general purpose storage (no mapreduce/hbase etc). >> >> >> Can somebody shed light on what the limitations are on the number of files >> that can be stored. Is it limited in anyway by the namenode? The use case I >> am interested in is to store a very large number of relatively small files >> (1MB to 25MB). >> >> >> Interestingly, I saw a facebook presentation on how they use hbase/hdfs >> internally. Them seem to store all metadata in hbase and the actual >> images/files/etc in something called “haystack” (why not use hdfs since they >> already have it?). Anybody know what “haystack” is? >> >> >> Thanks! >> >> Chinmay >> >> >> >> >> *From:* Jeff Hammerbacher >> [mailto:ham...@cloudera.com<http://mc/compose?to=ham...@cloudera.com>] >> >> *Sent:* Wednesday, February 02, 2011 3:31 PM >> *To:* >> hdfs-user@hadoop.apache.org<http://mc/compose?to=hdfs-user@hadoop.apache.org> >> *Subject:* Re: HDFS without Hadoop: Why? >> >> >> >> - Large block size wastes space for small file. The minimum file size >> is 1 block. >> >> That's incorrect. If a file is smaller than the block size, it will >> only consume as much space as there is data in the file. >> >> >> - There are no hardlinks, softlinks, or quotas. >> >> That's incorrect; there are quotas and softlinks. >> >> >> >> >> >> -- >> Connect to me at http://www.facebook.com/dhruba >> >> >> > >