Hi Russell,

We will be in production soon with both OS virtualized Hadoop deployments along 
with existing bare metal deployments.

We are finding tradeoffs on both sides. On the virtualization side; cluster 
elasticity and deployment times are easier. Speed of node recovery can be a 
faster with VM image restore. VM migration from one server to another makes 
planned hardware upgrades/repairs easier. But there's always the virtualization 
overhead/tax to pay along with what can be a set of multi-vm or multi-tenancy 
overhead.

I have been thinking about experimenting with a topology/rack level awareness 
scheme where one would map physical VM hosts to the VM's Hadoop instance rack 
affinity nesting level.

~Brad
        
-----Original Message-----
From: Russell Jurney [mailto:russell.jur...@gmail.com] 
Sent: Wednesday, December 14, 2011 1:27 PM
To: common-user@hadoop.apache.org
Subject: Re: More cores Vs More Nodes ?

You're using OS virtualization in your test.  Are you using it in production?

Russell Jurney
twitter.com/rjurney
russell.jur...@gmail.com
datasyndrome.com

On Dec 13, 2011, at 5:16 PM, Brad Sarsfield <b...@bing.com> wrote:

> The experiment was done in a cloud hosted environment running set of VMs.
>
> ~Brad
>
> -----Original Message-----
> From: Prashant Kommireddi [mailto:prash1...@gmail.com]
> Sent: Tuesday, December 13, 2011 9:46 AM
> To: common-user@hadoop.apache.org
> Subject: Re: More cores Vs More Nodes ?
>
> Hi Brad, how many taskstrackers did you have on each node in both cases?
>
> Thanks,
> Prashant
>
> Sent from my iPhone
>
> On Dec 13, 2011, at 9:42 AM, Brad Sarsfield <b...@bing.com> wrote:
>
>> Praveenesh,
>>
>> Your question is not naïve; in fact, optimal hardware design can ultimately 
>> be a very difficult question to answer on what would be "better". If you 
>> made me pick one without much information I'd go for more machines.  But...
>>
>> It all depends; and there is no right answer.... :)
>>
>> More machines
>>  +May run your workload faster
>>  +Will give you a higher degree of reliability protection from node / 
>> hardware / hard drive failure.
>>  +More aggregate IO capabilities
>>  - capex / opex may be higher than allocating more cores More cores  
>> +May run your workload faster  +More cores may allow for more tasks 
>> to run on the same machine  +More cores/tasks may reduce network 
>> contention and increase increasing task to task data flow performance.
>>
>> Notice "May run your workload faster" is in both; as it can be very workload 
>> dependant.
>>
>> My Experience:
>> I did a recent experiment and found that given the same number of 
>> cores (64) with the exact same network / machine configuration;
>>  A: I had 8 machines with 8 cores
>>  B: I had 28 machines with 2 cores (and 1x8 core head node)
>>
>> B was able to outperform A by 2x using teragen and terasort. These machines 
>> were running in a virtualized environment; where some of the IO capabilities 
>> behind the scenes were being regulated to 400Mbps per node when running in 
>> the 2 core configuration vs 1Gbps on the 8 core.  So I would expect the 
>> non-throttled scenario to work even better.
>>
>> ~Brad
>>
>>
>> -----Original Message-----
>> From: praveenesh kumar [mailto:praveen...@gmail.com]
>> Sent: Monday, December 12, 2011 8:51 PM
>> To: common-user@hadoop.apache.org
>> Subject: More cores Vs More Nodes ?
>>
>> Hey Guys,
>>
>> So I have a very naive question in my mind regarding Hadoop cluster nodes ?
>>
>> more cores or more nodes - Shall I spend money on going from 2-4 core 
>> machines, or spend money on buying more nodes less core eg. say 2 machines 
>> of 2 cores for example?
>>
>> Thanks,
>> Praveenesh
>>
>

Reply via email to