Hello, I have been following this thread for some time now. I am very comfortable with the advantages of hdfs, but still have lingering questions about the usage of hdfs for general purpose storage (no mapreduce/hbase etc).
Can somebody shed light on what the limitations are on the number of files that can be stored. Is it limited in anyway by the namenode? The use case I am interested in is to store a very large number of relatively small files (1MB to 25MB). Interestingly, I saw a facebook presentation on how they use hbase/hdfs internally. Them seem to store all metadata in hbase and the actual images/files/etc in something called "haystack" (why not use hdfs since they already have it?). Anybody know what "haystack" is? Thanks! Chinmay From: Jeff Hammerbacher [mailto:ham...@cloudera.com] Sent: Wednesday, February 02, 2011 3:31 PM To: hdfs-user@hadoop.apache.org Subject: Re: HDFS without Hadoop: Why? * Large block size wastes space for small file. The minimum file size is 1 block. That's incorrect. If a file is smaller than the block size, it will only consume as much space as there is data in the file. * There are no hardlinks, softlinks, or quotas. That's incorrect; there are quotas and softlinks.