Hi, I am a beginner trying to setup a few simple hadoop tests on a single node before moving on to a cluster. I am just using the simple wordcount example for now. My question is what's the best way to guarantee utilization of all cores on a single-node? So assuming a single node with 16-cores what are the suggested values for:
mapred.map.tasks mapred.reduce.tasks mapred.tasktracker.map.tasks.maximum mapred.tasktracker.map.tasks.maxium I found an old similar thread http://www.mail-archive.com/[email protected]/msg00152.html and I have followed similar settings for my 16-core system (e.g. map.tasks=reduce.tasks=90 and map.tasks.maximum=100), however I always see only 3-4 cores utilized using top. - The description for mapred.map.tasks says "Ignored when mapred.job.tracker is "local" ", and in my case mapred.job.tracker=hdfs://localhost:54311 is it possible that the map.tasks and reduce.tasks I am setting are being ignored? How can I verify this? Is there a way to enforce my values even on a localhost scenario like this? - Are there other config options/values that I need to set besides the 4 I mentioned above? - Also is it possible that for short tasks, I won't see full utilization of all cores anyway? Something along those lines is mentioned in an issue a year ago: http://issues.apache.org/jira/browse/HADOOP-3136 "If the individual tasks are very short i.e. run for less than the heartbeat interval the TaskTracker serially runs one task at a time" I am using hadoop-0.19.2 thanks for any guidance, - Vasilis
