I have been running with 2x replication on a 500tb cluster. No issues whatsoever. 3x is for super paranoid.
On Mon, Nov 7, 2011 at 5:06 PM, Ted Dunning <tdunn...@maprtech.com> wrote: > Depending on which distribution and what your data center power limits are > you may save a lot of money by going with machines that have 12 x 2 or 3 tb > drives. With suitable engineering margins and 3 x replication you can have > 5 tb net data per node and 20 nodes per rack. If you want to go all cowboy > with 2x replication and little space to spare then you can double that > density. > > On Monday, November 7, 2011, Rita <rmorgan...@gmail.com> wrote: > > For a 1PB installation you would need close to 170 servers with 12 TB > disk pack installed on them (with replication factor of 2). Thats a > conservative estimate > > CPUs: 4 cores with 16gb of memory > > > > Namenode: 4 core with 32gb of memory should be ok. > > > > > > On Fri, Oct 21, 2011 at 5:40 PM, Steve Ed <sediso...@gmail.com> wrote: > >> > >> I am a newbie to Hadoop and trying to understand how to Size a Hadoop > cluster. > >> > >> > >> > >> What are factors I should consider deciding the number of datanodes ? > >> > >> Datanode configuration ? CPU, Memory > >> > >> Amount of memory required for namenode ? > >> > >> > >> > >> My client is looking at 1 PB of usable data and will be running > analytics on TB size files using mapreduce. > >> > >> > >> > >> > >> > >> Thanks > >> > >> ….. Steve > >> > >> > > > > > > -- > > --- Get your facts first, then you can distort them as you please.-- > > > -- --- Get your facts first, then you can distort them as you please.--