Hi,

I am a beginner trying to setup a few simple hadoop tests on a single
node before moving on to a cluster. I am just using the simple
wordcount example for now. My question is what's the best way to
guarantee utilization of all cores on a single-node? So assuming a
single node with 16-cores what are the suggested values for:

mapred.map.tasks
mapred.reduce.tasks
mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.map.tasks.maxium

I found an old similar thread
http://www.mail-archive.com/[email protected]/msg00152.html
and I have followed similar settings for my 16-core system (e.g.
map.tasks=reduce.tasks=90 and map.tasks.maximum=100), however I always
see only 3-4 cores utilized using top.

- The description for mapred.map.tasks says "Ignored when
mapred.job.tracker is "local" ", and in my case
mapred.job.tracker=hdfs://localhost:54311
is it possible that the map.tasks and reduce.tasks I am setting are
being ignored? How can I verify this? Is there a way to enforce my
values even on a localhost scenario like this?

- Are there other config options/values that I need to set besides the
4 I mentioned above?

- Also is it possible that for short tasks, I won't see full
utilization of all cores anyway? Something along those lines is
mentioned in an issue a year ago:
http://issues.apache.org/jira/browse/HADOOP-3136
"If the individual tasks are very short i.e. run for less than the
heartbeat interval the TaskTracker serially runs one task at a time"

I am using hadoop-0.19.2

thanks for any guidance,

- Vasilis

Reply via email to