Hello,

We have 12 node Hadoop Cluster that is running Hadoop 0.20.2-cdh3u0. Each
node has 8 core and 144GB RAM (don't ask). So, I want to take advantage of
this huge RAM and run the map-reduce jobs mostly in memory with no spill, if
possible. We use Hive for most of the processes. I have set:
mapred.tasktracker.map.tasks.maximum = 16
mapred.tasktracker.reduce.tasks.maximum = 8
mapred.child.java.opts = 6144
mapred.reduce.parallel.copies = 20
mapred.job.shuffle.merge.percent = 1.0
mapred.job.reduce.input.buffer.percent = 0.25
mapred.inmem.merge.threshold = 0

One of my Hive queries is producing 6 stage map-reduce jobs. On the third
stage when it queries from a 200GB table, the last 14 reducers hang. I
changed mapred.task.timeout to 0 to see if they really hang. It has been 5
hours, so something terribly wrong in my setup. Parts of the log is below.
My questions:
* What should be my configurations to make reducers to run in the memory?
* Why it keeps waiting for map outputs?
* What does it mean "dup hosts"?

Thank you!
N.Gesli



2011-10-27 16:35:24,304 WARN org.apache.hadoop.util.NativeCodeLoader: Unable
to load native-hadoop library for your platform... using builtin-java
classes where applicable

2011-10-27 16:35:24,611 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=SHUFFLE, sessionId=
2011-10-27 16:35:24,722 INFO org.apache.hadoop.mapred.ReduceTask:
ShuffleRamManager: MemoryLimit=1503238528,
MaxSingleShuffleLimit=375809632
2011-10-27 16:35:24,733 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Thread started: Thread for
merging on-disk files
2011-10-27 16:35:24,733 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Thread waiting: Thread for
merging on-disk files
2011-10-27 16:35:24,734 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Thread started: Thread for
merging in memory files
2011-10-27 16:35:24,735 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Thread started: Thread for
polling Map Completion Events
2011-10-27 16:35:24,735 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Need another 1308 map output(s)
where 0 is already in progress
2011-10-27 16:35:24,736 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Scheduled 0 outputs (0 slow hosts
and0 dup hosts)
2011-10-27 16:35:29,738 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Scheduled 12 outputs (0 slow
hosts and0 dup hosts)
2011-10-27 16:35:30,364 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Scheduled 5 outputs (0 slow hosts
and753 dup hosts)
2011-10-27 16:35:30,367 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Scheduled 1 outputs (0 slow hosts
and1182 dup hosts)
2011-10-27 16:35:30,368 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Scheduled 1 outputs (0 slow hosts
and1184 dup hosts)
2011-10-27 16:35:30,371 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Scheduled 2 outputs (0 slow hosts
and1073 dup hosts)

...

2011-10-27 16:36:04,284 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Scheduled 1 outputs (0 slow hosts
and958 dup hosts)

2011-10-27 16:36:04,310 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Scheduled 1 outputs (0 slow hosts
and958 dup hosts)
2011-10-27 16:36:07,721 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Scheduled 1 outputs (0 slow hosts
and950 dup hosts)
2011-10-27 16:36:16,455 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Scheduled 1 outputs (0 slow hosts
and948 dup hosts)
2011-10-27 16:36:16,464 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Scheduled 1 outputs (0 slow hosts
and948 dup hosts)
2011-10-27 16:36:24,736 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Need another 1061 map output(s)
where 12 is already in progress
2011-10-27 16:36:24,736 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Scheduled 0 outputs (0 slow hosts
and1049 dup hosts)
2011-10-27 16:37:24,737 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Need another 1061 map output(s)
where 12 is already in progress
2011-10-27 16:37:24,737 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Scheduled 0 outputs (0 slow hosts
and1049 dup hosts)
2011-10-27 16:38:24,738 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Need another 1061 map output(s)
where 12 is already in progress
2011-10-27 16:38:24,738 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Scheduled 0 outputs (0 slow hosts
and1049 dup hosts)
2011-10-27 16:39:24,739 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Need another 1061 map output(s)
where 12 is already in progress
2011-10-27 16:39:24,739 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Scheduled 0 outputs (0 slow hosts
and1049 dup hosts)
2011-10-27 16:40:24,740 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Need another 1061 map output(s)
where 12 is already in progress
2011-10-27 16:40:24,740 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Scheduled 0 outputs (0 slow hosts
and1049 dup hosts)

...

2011-10-27 21:55:25,070 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Need another 1061 map output(s)
where 12 is already in progress
2011-10-27 21:55:25,070 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Scheduled 0 outputs (0 slow hosts
and1049 dup hosts)
2011-10-27 21:56:25,071 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Need another 1061 map output(s)
where 12 is already in progress
2011-10-27 21:56:25,071 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Scheduled 0 outputs (0 slow hosts
and1049 dup hosts)
2011-10-27 21:57:25,072 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201110201507_0061_r_000007_1 Need another 1061 map output(s)
where 12 is already in progress


------------------------------

Reply via email to