Hey Wildan, HDFS is successfully storing well over 50TBs on a single cluster. It's meant to store data that will be analyzed in a MR job, but it can be used for archival storage. You'd probably consider deploying nodes with lots of disk space vs. lots of RAM and processor power. You'll want to do a cost analysis to determine if tape or HDFS is cheaper.
That said, you should know a few things about HDFS: - Its read path is optimized for high throughput, and doesn't care as much about latency (read: it's got high latency relative to other file systems) - It's not meant for small files, so ideally your video files will be at least ~100MB each - It requires that the machines that makeup your cluster be running whenever you want to access or store data. (Note that HDFS survives if a small percentage of your nodes go down; it's built with fault tolerance in mind) I hope this clears things up. Let me know if you have any other questions. Alex On Tue, Jun 16, 2009 at 2:44 AM, W <wilda...@gmail.com> wrote: > Dear Hadoop Guru's, > > After googling and find some information on using hadoop as cloud > storage (long term). > I have a problem to maintain lots of data (around 50 TB) much of them > are TV Commercial (video files). > > I know, the best solution for long term file archiving is using tape > backup, but i just curious, is hadoop > can be used as 'data archiving' platform ? > > Thanks! > > Warm Regards, > Wildan > --- > OpenThink Labs > http://openthink-labs.tobethink.com/ > > Making IT, Business and Education in Harmony > > >> 087884599249 > > Y! : hawking_123 > Linkedln : http://www.linkedin.com/in/wildanmaulana >