Re: contrib EC2 with hadoop 0.17

Chris K Wensel Mon, 09 Jun 2008 09:02:29 -0700

Thanks for the description, Chris. Now that I understand the basic
model, I'm starting to see how the configuration is passed to the
slaves using the -d option of ec2-run-instances.


One config question: on our cluster (hadoop 0.17 with
INSTANCE_TYPE="m1.small") the conf/hadoop-default.xml has
mapred.reduce.tasks set to 1, and mapred.map.tasks set to 2.

From experimenting and reading the FAQ, it looks like those numbers
should be higher, unless you have single-machine cluster. Maybe
there's something I'm missing, but by upping mapred.map.tasks and
mapred.reduce.tasks to 5 and 15 (in our job jar) we're getting much
better performance. Is there a reason hadoop-init doesn't build a
hadoop-site.xml file with higher or configurable values for these
fields?

configuration values should be set in conf/hadoop-site.xml. Thoseparticular values you are referring to probably should be set per joband generally don't have anything to do with instance sizes but moreto do with cluster size and the job being run.

different instance sizes have mapred.tasktracker.map.tasks.maximum andmapred.tasktracker.reduce.tasks.maximum set accordingly (see hadoop-init), but again might/should be tuned to your application (cpu or iobound).


ckw

Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/

Re: contrib EC2 with hadoop 0.17

Reply via email to