Re: Sizing help

Matt Foley Fri, 11 Nov 2011 01:59:02 -0800

I agree with Ted's argument that 3x replication is way better than 2x.  But
I do have to point out that, since 0.20.204, the loss of a disk no longer
causes the loss of a whole node (thankfully!) unless it's the system disk.
 So in the example given, if you estimate a disk failure every 2 hours,
each node only has to re-replicate about 2GB of data, not 12GB.  So about
1-in-72 such failures risks data loss, rather than 1-in-12.  Which is still
unacceptable, so use 3x replication! :-)
--Matt


On Mon, Nov 7, 2011 at 4:53 PM, Ted Dunning <tdunn...@maprtech.com> wrote:

> 3x replication has two effects.  One is reliability.  This is probably
> more important in large clusters than small.
>
> Another important effect is data locality during map-reduce.  Having 3x
> replication allows mappers to have almost all invocations read from local
> disk.  2x replication compromises this.  Even where you don't have local
> data, the bandwidth available to read from 3x replicated data is 1.5x the
> bandwidth available for 2x replication.
>
> To get a rough feel for how reliable you should consider a cluster, you
> can do a pretty simple computation.  If you have 12 x 2T on a single
> machine and you lose that machine, the remaining copies of that data must
> be replicated before another disk fails.  With HDFS and block-level
> replication, the remaining copies will be spread across the entire cluster
> to any disk failure is reasonably like to cause data loss.  For a 1000 node
> cluster with 12000 disks, it is conservative to estimate a disk failure on
> average every 2 hours.  Each node will have replicate about 12GB of data
> which will take about 500 seconds or about 9 or 10 minutes if you only use
> 25% of your network for re-replication.  The probability of a disk failure
>  during a 10 minute period is 1-exp(-10/120) = 8%.  This means that roughly
> 1 in 12 full machine failures might cause data loss.   You can pick
> whatever you like for the rate at which nodes die, but I don't think that
> this is acceptable.
>
> My numbers for disk failures are purposely somewhat pessimistic.  If you
> change the MTBF for disks to 10 years instead of 3 years, then the
> probability of data loss after a machine failure drops, but only to about
> 2.5%.
>
> Now, I would be the first to say that these numbers feel too high, but I
> also would rather not experience enough data loss events to have a reliable
> gut feel for how often they should occur.
>
> My feeling is that 2x is fine for data you can reconstruct and which you
> don't need to read really fast, but not good enough for data whose loss
> will get you fired.
>
> On Mon, Nov 7, 2011 at 7:34 PM, Rita <rmorgan...@gmail.com> wrote:
>
>> I have been running with 2x replication on a 500tb cluster. No issues
>> whatsoever. 3x is for super paranoid.
>>
>>
>> On Mon, Nov 7, 2011 at 5:06 PM, Ted Dunning <tdunn...@maprtech.com>wrote:
>>
>>> Depending on which distribution and what your data center power limits
>>> are you may save a lot of money by going with machines that have 12 x 2 or
>>> 3 tb drives.  With suitable engineering margins and 3 x replication you can
>>> have 5 tb net data per node and 20 nodes per rack.  If you want to go all
>>> cowboy with 2x replication and little space to spare then you can double
>>> that density.
>>>
>>> On Monday, November 7, 2011, Rita <rmorgan...@gmail.com> wrote:
>>> > For a 1PB installation you would need close to 170 servers with 12 TB
>>> disk pack installed on them (with replication factor of 2). Thats a
>>> conservative estimate
>>> > CPUs: 4 cores with 16gb of memory
>>> >
>>> > Namenode: 4 core with 32gb of memory should be ok.
>>> >
>>> >
>>> > On Fri, Oct 21, 2011 at 5:40 PM, Steve Ed <sediso...@gmail.com> wrote:
>>> >>
>>> >> I am a newbie to Hadoop and trying to understand how to Size a Hadoop
>>> cluster.
>>> >>
>>> >>
>>> >>
>>> >> What are factors I should consider deciding the number of datanodes ?
>>> >>
>>> >> Datanode configuration ?  CPU, Memory
>>> >>
>>> >> Amount of memory required for namenode ?
>>> >>
>>> >>
>>> >>
>>> >> My client is looking at 1 PB of  usable data and will be running
>>> analytics on TB size files using mapreduce.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> Thanks
>>> >>
>>> >> ….. Steve
>>> >>
>>> >>
>>> >
>>> >
>>> > --
>>> > --- Get your facts first, then you can distort them as you please.--
>>> >
>>>
>>
>>
>>
>> --
>> --- Get your facts first, then you can distort them as you please.--
>>
>
>

Re: Sizing help

Reply via email to