Re: Hadoop benchmarking
Owen, one problem with Arun's slide deck is that while it lists the parameters that matter, it doesn't list suggested values for them. Do you have any guide about that? In particular, the only places I know that talk about how to set these parameters are http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/and http://wiki.apache.org/hadoop/FAQ#3. On Wed, Jun 10, 2009 at 12:14 PM, Owen O'Malley wrote: > Take a look at Arun's slide deck on Hadoop performance: > > http://bit.ly/EDCg3 > > It is important to get io.sort.mb large enough, the io.sort.factor should > be closer to 100 instead of 10. I'd also use large block sizes to reduce the > number of maps. Please see the deck for other important factors. > > -- Owen >
Re: Hadoop benchmarking
Take a look at Arun's slide deck on Hadoop performance: http://bit.ly/EDCg3 It is important to get io.sort.mb large enough, the io.sort.factor should be closer to 100 instead of 10. I'd also use large block sizes to reduce the number of maps. Please see the deck for other important factors. -- Owen
Re: Hadoop benchmarking
Hi Stephen, That will set the maximum heap allowable, but doesn't tell Hadoop's internal systems necessarily to take advantage of it. There's a number of other settings that adjust performance. At Cloudera we have a config tool that generates Hadoop configurations with reasonable first-approximation values for your cluster -- check out http://my.cloudera.com and look at the hadoop-site.xml it generates. If you start from there you might find a better parameter space to explore. Please share back your findings -- we'd love to tweak the tool even more with some external feedback :) - Aaron On Wed, Jun 10, 2009 at 7:39 AM, stephen mulcahy wrote: > Hi, > > I'm currently doing some testing of different configurations using the > Hadoop Sort as follows, > > bin/hadoop jar hadoop-*-examples.jar randomwriter > -Dtest.randomwrite.total_bytes=107374182400 /benchmark100 > > bin/hadoop jar hadoop-*-examples.jar sort /benchmark100 rand-sort > > The only changes I've made from the standard config are the following in > conf/mapred-site.xml > > > mapred.child.java.opts > -Xmx1024M > > > > mapred.tasktracker.map.tasks.maximum > 8 > > > > mapred.tasktracker.reduce.tasks.maximum > 4 > > > I'm running this on 4 systems, each with 8 processor cores and 4 separate > disks. > > Is there anything else I should change to stress memory more? The systems > in questions have 16GB of memory but the most thats getting used during a > run of this benchmark is about 2GB (and most of that seems to be os > caching). > > Thanks, > > -stephen > > -- > Stephen Mulcahy, DI2, Digital Enterprise Research Institute, > NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland > http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com >
Hadoop benchmarking
Hi, I'm currently doing some testing of different configurations using the Hadoop Sort as follows, bin/hadoop jar hadoop-*-examples.jar randomwriter -Dtest.randomwrite.total_bytes=107374182400 /benchmark100 bin/hadoop jar hadoop-*-examples.jar sort /benchmark100 rand-sort The only changes I've made from the standard config are the following in conf/mapred-site.xml mapred.child.java.opts -Xmx1024M mapred.tasktracker.map.tasks.maximum 8 mapred.tasktracker.reduce.tasks.maximum 4 I'm running this on 4 systems, each with 8 processor cores and 4 separate disks. Is there anything else I should change to stress memory more? The systems in questions have 16GB of memory but the most thats getting used during a run of this benchmark is about 2GB (and most of that seems to be os caching). Thanks, -stephen -- Stephen Mulcahy, DI2, Digital Enterprise Research Institute, NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com