Having experimented some more, I've found that the simple solution is to limit the resource usage by limiting the # of map tasks and the memory they are allowed to consume.
I'm specifying the constraints on the command line like this: -jobconf mapred.tasktracker.map.tasks.maximum=2 mapred.child.ulimit=1048576 The configuration parameters seem to take, in the job.xml available from the web console, I see these lines: mapred.child.ulimit 1048576 mapred.tasktracker.map.tasks.maximum 2 The problem is that when there are a large number of map tasks to complete, Hadoop doesn't seem to obey the map.tasks.maximum. Instead, it is spawning 8 map tasks per tasktracker (even when I change the mapred.tasktracker.map.tasks.maximum in hadoop-site.xml to 2, on the master). The cluster was booted with the setting at 8. Do I need to change hadoop-site.xml on all the slaves, and restart the task trackers, in order to make the limit apply? That seems unlikely - I'd really like to manage this parameter on a per-job level. Thanks for any input! Chris -- Chris Anderson http://jchris.mfdz.com