Mike,

Somewhat of a tangent but it is actually very informative to hear that you are 
getting bound by I/O with a 2:1 core to disk ratio. Could you share what you 
used to make those calls? We have been using both a local ganglia daemon as 
well as the Hadoop ganglia daemon to get an overall look at the cluster and the 
items of interest, I would assume, would be CPU wait i/o as well as the 
throughput of block operations.

Obviously the disconnect on my side was I didn't realize you were dedicating a 
physical core per daemon. I am a little surprised that you found that necessary 
but then again after seeing some of the metrics from my own stress testing I am 
noticing that we might be over extending with our config on heavy loads. 
Unfortunately I am working with lower specced hardware at the moment so I don't 
have the overhead to test that out.

Matt

-----Original Message-----
From: Michael Segel [mailto:michael_se...@hotmail.com] 
Sent: Tuesday, June 28, 2011 1:31 PM
To: common-user@hadoop.apache.org
Subject: RE: Performance Tunning



Matthew,

I understood that Juan was talking about a 2 socket quad core box.  We run 
boxes with the e5500 (xeon quad core ) chips. Linux sees these as 16 cores. 
Our data nodes are 32GB Ram w 4 x 2TB SATA. Its a pretty basic configuration. 

What I was saying was that if you consider 1 core for each TT, DN and RS jobs, 
thats 3 out of the 8 physical cores, leaving you 5 cores or 10 'hyperthread 
cores'.
So you could put up 10 m/r slots on the machine.  Note that on the main tasks 
(TT, DN, RS) I dedicate the physical core.

Of course your mileage may vary if you're doing non-standard or normal things.  
A good starting point is 6 mappers and 4 reducers. 
And of course YMMV depending on if you're using MapR's release, Cloudera, and 
if you're running HBase or something else on the cluster.

>From our experience... we end up getting disk I/O bound first, and then 
>network or memory becomes the next constraint. Really the xeon chipsets are 
>really good. 

HTH

-Mike


> From: matthew.go...@monsanto.com
> To: common-user@hadoop.apache.org
> Subject: RE: Performance Tunning
> Date: Tue, 28 Jun 2011 14:46:40 +0000
> 
> Mike,
> 
> I'm not really sure I have seen a community consensus around how to handle 
> hyper-threading within Hadoop (although I have seen quite a few articles that 
> discuss it). I was assuming that when Juan mentioned they were 4-core boxes 
> that he meant 4 physical cores and not HT cores. I was more stating that the 
> starting point should be 1 slot per thread (or hyper-threaded core) but 
> obviously reviewing the results from ganglia, or any other monitoring 
> solution, will help you come up with a more concrete configuration based on 
> the load.
> 
> My brain might not be working this morning but how did you get the 10 slots 
> again? That seems low for an 8 physical core box but somewhat overextending 
> for a 4 physical core box.
> 
> Matt
> 
> -----Original Message-----
> From: im_gu...@hotmail.com [mailto:im_gu...@hotmail.com] On Behalf Of Michel 
> Segel
> Sent: Tuesday, June 28, 2011 7:39 AM
> To: common-user@hadoop.apache.org
> Subject: Re: Performance Tunning
> 
> Matt,
> You have 2 threads per core, so your Linux box thinks an 8 core box has16 
> cores. In my calcs, I tend to take a whole core for TT DN and RS and then a 
> thread per slot so you end up w 10 slots per node. Of course memory is also a 
> factor.
> 
> Note this is only a starting point.you can always tune up. 
> 
> Sent from a remote device. Please excuse any typos...
> 
> Mike Segel
> 
> On Jun 27, 2011, at 11:11 PM, "GOEKE, MATTHEW (AG/1000)" 
> <matthew.go...@monsanto.com> wrote:
> 
> > Per node: 4 cores * 2 processes = 8 slots
> > Datanode: 1 slot
> > Tasktracker: 1 slot
> > 
> > Therefore max of 6 slots between mappers and reducers.
> > 
> > Below is part of our mapred-site.xml. The thing to keep in mind is the 
> > number of maps is defined by the number of input splits (which is defined 
> > by your data) so you only need to worry about setting the maximum number of 
> > concurrent processes per node. In this case the property you want to hone 
> > in on is mapred.tasktracker.map.tasks.maximum and 
> > mapred.tasktracker.reduce.tasks.maximum. Keep in mind there are a LOT of 
> > other tuning improvements that can be made but it requires an strong 
> > understanding of your job load.
> > 
> > <configuration>
> >  <property>
> >    <name>mapred.tasktracker.map.tasks.maximum</name>
> >    <value>2</value>
> >  </property>
> > 
> >  <property>
> >    <name>mapred.tasktracker.reduce.tasks.maximum</name>
> >    <value>1</value>
> >  </property>
> > 
> >  <property>
> >    <name>mapred.child.java.opts</name>
> >    <value>-Xmx512m</value>
> >  </property>
> > 
> >  <property>
> >    <name>mapred.compress.map.output</name>
> >    <value>true</value>
> >  </property>
> > 
> >  <property>
> >    <name>mapred.output.compress</name>
> >    <value>true</value>
> >  </property>
> > 
> > 
> This e-mail message may contain privileged and/or confidential information, 
> and is intended to be received only by persons entitled
> to receive such information. If you have received this e-mail in error, 
> please notify the sender immediately. Please delete it and
> all attachments from any servers, hard drives or any other media. Other use 
> of this e-mail by you is strictly prohibited.
> 
> All e-mails and attachments sent and received are subject to monitoring, 
> reading and archival by Monsanto, including its
> subsidiaries. The recipient of this e-mail is solely responsible for checking 
> for the presence of "Viruses" or other "Malware".
> Monsanto, along with its subsidiaries, accepts no liability for any damage 
> caused by any such code transmitted by or accompanying
> this e-mail or any attachment.
> 
> 
> The information contained in this email may be subject to the export control 
> laws and regulations of the United States, potentially
> including but not limited to the Export Administration Regulations (EAR) and 
> sanctions regulations issued by the U.S. Department of
> Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this 
> information you are obligated to comply with all
> applicable U.S. export laws and regulations.
> 
                                          

Reply via email to