Having experimented some more, I've found that the simple solution is
to limit the resource usage by limiting the # of map tasks and the
memory they are allowed to consume.

I'm specifying the constraints on the command line like this:

-jobconf mapred.tasktracker.map.tasks.maximum=2 mapred.child.ulimit=1048576

The configuration parameters seem to take, in the job.xml available
from the web console, I see these lines:

mapred.child.ulimit     1048576
mapred.tasktracker.map.tasks.maximum    2

The problem is that when there are a large number of map tasks to
complete, Hadoop doesn't seem to obey the map.tasks.maximum. Instead,
it is spawning 8 map tasks per tasktracker (even when I change the
mapred.tasktracker.map.tasks.maximum in hadoop-site.xml to 2, on the
master). The cluster was booted with the setting at 8. Do I need to
change hadoop-site.xml on all the slaves, and restart the task
trackers, in order to make the limit apply? That seems unlikely - I'd
really like to manage this parameter on a per-job level.

Thanks for any input!

Chris

-- 
Chris Anderson
http://jchris.mfdz.com

Reply via email to