Matthew,
I understood that Juan was talking about a 2 socket quad core box. We run boxes with the e5500 (xeon quad core ) chips. Linux sees these as 16 cores. Our data nodes are 32GB Ram w 4 x 2TB SATA. Its a pretty basic configuration. What I was saying was that if you consider 1 core for each TT, DN and RS jobs, thats 3 out of the 8 physical cores, leaving you 5 cores or 10 'hyperthread cores'. So you could put up 10 m/r slots on the machine. Note that on the main tasks (TT, DN, RS) I dedicate the physical core. Of course your mileage may vary if you're doing non-standard or normal things. A good starting point is 6 mappers and 4 reducers. And of course YMMV depending on if you're using MapR's release, Cloudera, and if you're running HBase or something else on the cluster. >From our experience... we end up getting disk I/O bound first, and then >network or memory becomes the next constraint. Really the xeon chipsets are >really good. HTH -Mike > From: matthew.go...@monsanto.com > To: common-user@hadoop.apache.org > Subject: RE: Performance Tunning > Date: Tue, 28 Jun 2011 14:46:40 +0000 > > Mike, > > I'm not really sure I have seen a community consensus around how to handle > hyper-threading within Hadoop (although I have seen quite a few articles that > discuss it). I was assuming that when Juan mentioned they were 4-core boxes > that he meant 4 physical cores and not HT cores. I was more stating that the > starting point should be 1 slot per thread (or hyper-threaded core) but > obviously reviewing the results from ganglia, or any other monitoring > solution, will help you come up with a more concrete configuration based on > the load. > > My brain might not be working this morning but how did you get the 10 slots > again? That seems low for an 8 physical core box but somewhat overextending > for a 4 physical core box. > > Matt > > -----Original Message----- > From: im_gu...@hotmail.com [mailto:im_gu...@hotmail.com] On Behalf Of Michel > Segel > Sent: Tuesday, June 28, 2011 7:39 AM > To: common-user@hadoop.apache.org > Subject: Re: Performance Tunning > > Matt, > You have 2 threads per core, so your Linux box thinks an 8 core box has16 > cores. In my calcs, I tend to take a whole core for TT DN and RS and then a > thread per slot so you end up w 10 slots per node. Of course memory is also a > factor. > > Note this is only a starting point.you can always tune up. > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On Jun 27, 2011, at 11:11 PM, "GOEKE, MATTHEW (AG/1000)" > <matthew.go...@monsanto.com> wrote: > > > Per node: 4 cores * 2 processes = 8 slots > > Datanode: 1 slot > > Tasktracker: 1 slot > > > > Therefore max of 6 slots between mappers and reducers. > > > > Below is part of our mapred-site.xml. The thing to keep in mind is the > > number of maps is defined by the number of input splits (which is defined > > by your data) so you only need to worry about setting the maximum number of > > concurrent processes per node. In this case the property you want to hone > > in on is mapred.tasktracker.map.tasks.maximum and > > mapred.tasktracker.reduce.tasks.maximum. Keep in mind there are a LOT of > > other tuning improvements that can be made but it requires an strong > > understanding of your job load. > > > > <configuration> > > <property> > > <name>mapred.tasktracker.map.tasks.maximum</name> > > <value>2</value> > > </property> > > > > <property> > > <name>mapred.tasktracker.reduce.tasks.maximum</name> > > <value>1</value> > > </property> > > > > <property> > > <name>mapred.child.java.opts</name> > > <value>-Xmx512m</value> > > </property> > > > > <property> > > <name>mapred.compress.map.output</name> > > <value>true</value> > > </property> > > > > <property> > > <name>mapred.output.compress</name> > > <value>true</value> > > </property> > > > > > This e-mail message may contain privileged and/or confidential information, > and is intended to be received only by persons entitled > to receive such information. If you have received this e-mail in error, > please notify the sender immediately. Please delete it and > all attachments from any servers, hard drives or any other media. Other use > of this e-mail by you is strictly prohibited. > > All e-mails and attachments sent and received are subject to monitoring, > reading and archival by Monsanto, including its > subsidiaries. The recipient of this e-mail is solely responsible for checking > for the presence of "Viruses" or other "Malware". > Monsanto, along with its subsidiaries, accepts no liability for any damage > caused by any such code transmitted by or accompanying > this e-mail or any attachment. > > > The information contained in this email may be subject to the export control > laws and regulations of the United States, potentially > including but not limited to the Export Administration Regulations (EAR) and > sanctions regulations issued by the U.S. Department of > Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this > information you are obligated to comply with all > applicable U.S. export laws and regulations. >