Re: Hadoop benchmarking

2009-06-11 Thread Matei Zaharia
Owen, one problem with Arun's slide deck is that while it lists the
parameters that matter, it doesn't list suggested values for them. Do you
have any guide about that? In particular, the only places I know that talk
about how to set these parameters are
http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/and
http://wiki.apache.org/hadoop/FAQ#3.

On Wed, Jun 10, 2009 at 12:14 PM, Owen O'Malley  wrote:

> Take a look at Arun's slide deck on Hadoop performance:
>
> http://bit.ly/EDCg3
>
> It is important to get io.sort.mb large enough, the io.sort.factor should
> be closer to 100 instead of 10. I'd also use large block sizes to reduce the
> number of maps. Please see the deck for other important factors.
>
> -- Owen
>


Re: Hadoop benchmarking

2009-06-10 Thread Owen O'Malley

Take a look at Arun's slide deck on Hadoop performance:

http://bit.ly/EDCg3

It is important to get io.sort.mb large enough, the io.sort.factor  
should be closer to 100 instead of 10. I'd also use large block sizes  
to reduce the number of maps. Please see the deck for other important  
factors.


-- Owen


Re: Hadoop benchmarking

2009-06-10 Thread Aaron Kimball
Hi Stephen,

That will set the maximum heap allowable, but doesn't tell Hadoop's internal
systems necessarily to take advantage of it. There's a number of other
settings that adjust performance. At Cloudera we have a config tool that
generates Hadoop configurations with reasonable first-approximation values
for your cluster -- check out http://my.cloudera.com and look at the
hadoop-site.xml it generates. If you start from there you might find a
better parameter space to explore. Please share back your findings -- we'd
love to tweak the tool even more with some external feedback :)

- Aaron


On Wed, Jun 10, 2009 at 7:39 AM, stephen mulcahy
wrote:

> Hi,
>
> I'm currently doing some testing of different configurations using the
> Hadoop Sort as follows,
>
> bin/hadoop jar hadoop-*-examples.jar randomwriter
> -Dtest.randomwrite.total_bytes=107374182400 /benchmark100
>
> bin/hadoop jar hadoop-*-examples.jar sort /benchmark100 rand-sort
>
> The only changes I've made from the standard config are the following in
> conf/mapred-site.xml
>
>   
> mapred.child.java.opts
> -Xmx1024M
>   
>
>   
> mapred.tasktracker.map.tasks.maximum
> 8
>   
>
>   
> mapred.tasktracker.reduce.tasks.maximum
> 4
>   
>
> I'm running this on 4 systems, each with 8 processor cores and 4 separate
> disks.
>
> Is there anything else I should change to stress memory more? The systems
> in questions have 16GB of memory but the most thats getting used during a
> run of this benchmark is about 2GB (and most of that seems to be os
> caching).
>
> Thanks,
>
> -stephen
>
> --
> Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
> NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
> http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com
>


Hadoop benchmarking

2009-06-10 Thread stephen mulcahy

Hi,

I'm currently doing some testing of different configurations using the 
Hadoop Sort as follows,


bin/hadoop jar hadoop-*-examples.jar randomwriter 
-Dtest.randomwrite.total_bytes=107374182400 /benchmark100


bin/hadoop jar hadoop-*-examples.jar sort /benchmark100 rand-sort

The only changes I've made from the standard config are the following in 
conf/mapred-site.xml


   
 mapred.child.java.opts
 -Xmx1024M
   

   
 mapred.tasktracker.map.tasks.maximum
 8
   

   
 mapred.tasktracker.reduce.tasks.maximum
 4
   

I'm running this on 4 systems, each with 8 processor cores and 4 
separate disks.


Is there anything else I should change to stress memory more? The 
systems in questions have 16GB of memory but the most thats getting used 
during a run of this benchmark is about 2GB (and most of that seems to 
be os caching).


Thanks,

-stephen

--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com