Underlying network bandwidth and rack locality, as well as the operational overhead of managing the machines. After a certain scale point, there will most always be at least one machine failing.
On Sun, Feb 21, 2010 at 7:54 AM, Jeff Zhang <zjf...@gmail.com> wrote: > ---------- Forwarded message ---------- > From: Jeff Zhang <zjf...@gmail.com> > Date: Sun, Feb 21, 2010 at 7:49 AM > Subject: What is the biggest problem of extremely large hadoop cluster ? > To: hdfs-...@hadoop.apache.org > > > Hi , > > I am curious to know what is the biggest problem of extremely large hadoop > cluster. What I can imagine now is the memory cost of meta data of hdfs in > name node. One solution I can think about now is to use other storage > implementation such as database to store the metadata, although it has > performance cost. Is there any other solutions or any problems of extremely > large hadoop cluster ? > > > > -- > Best Regards > > Jeff Zhang > > > > -- > Best Regards > > Jeff Zhang > -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals