Hello, We have 12 node Hadoop Cluster that is running Hadoop 0.20.2-cdh3u0. Each node has 8 core and 144GB RAM (don't ask). So, I want to take advantage of this huge RAM and run the map-reduce jobs mostly in memory with no spill, if possible. We use Hive for most of the processes. I have set: mapred.tasktracker.map.tasks.maximum = 16 mapred.tasktracker.reduce.tasks.maximum = 8 mapred.child.java.opts = 6144 mapred.reduce.parallel.copies = 20 mapred.job.shuffle.merge.percent = 1.0 mapred.job.reduce.input.buffer.percent = 0.25 mapred.inmem.merge.threshold = 0
One of my Hive queries is producing 6 stage map-reduce jobs. On the third stage when it queries from a 200GB table, the last 14 reducers hang. I changed mapred.task.timeout to 0 to see if they really hang. It has been 5 hours, so something terribly wrong in my setup. Parts of the log is below. My questions: * What should be my configurations to make reducers to run in the memory? * Why it keeps waiting for map outputs? * What does it mean "dup hosts"? Thank you! N.Gesli 2011-10-27 16:35:24,304 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2011-10-27 16:35:24,611 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=SHUFFLE, sessionId= 2011-10-27 16:35:24,722 INFO org.apache.hadoop.mapred.ReduceTask: ShuffleRamManager: MemoryLimit=1503238528, MaxSingleShuffleLimit=375809632 2011-10-27 16:35:24,733 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Thread started: Thread for merging on-disk files 2011-10-27 16:35:24,733 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Thread waiting: Thread for merging on-disk files 2011-10-27 16:35:24,734 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Thread started: Thread for merging in memory files 2011-10-27 16:35:24,735 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Thread started: Thread for polling Map Completion Events 2011-10-27 16:35:24,735 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Need another 1308 map output(s) where 0 is already in progress 2011-10-27 16:35:24,736 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Scheduled 0 outputs (0 slow hosts and0 dup hosts) 2011-10-27 16:35:29,738 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Scheduled 12 outputs (0 slow hosts and0 dup hosts) 2011-10-27 16:35:30,364 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Scheduled 5 outputs (0 slow hosts and753 dup hosts) 2011-10-27 16:35:30,367 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Scheduled 1 outputs (0 slow hosts and1182 dup hosts) 2011-10-27 16:35:30,368 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Scheduled 1 outputs (0 slow hosts and1184 dup hosts) 2011-10-27 16:35:30,371 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Scheduled 2 outputs (0 slow hosts and1073 dup hosts) ... 2011-10-27 16:36:04,284 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Scheduled 1 outputs (0 slow hosts and958 dup hosts) 2011-10-27 16:36:04,310 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Scheduled 1 outputs (0 slow hosts and958 dup hosts) 2011-10-27 16:36:07,721 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Scheduled 1 outputs (0 slow hosts and950 dup hosts) 2011-10-27 16:36:16,455 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Scheduled 1 outputs (0 slow hosts and948 dup hosts) 2011-10-27 16:36:16,464 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Scheduled 1 outputs (0 slow hosts and948 dup hosts) 2011-10-27 16:36:24,736 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Need another 1061 map output(s) where 12 is already in progress 2011-10-27 16:36:24,736 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Scheduled 0 outputs (0 slow hosts and1049 dup hosts) 2011-10-27 16:37:24,737 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Need another 1061 map output(s) where 12 is already in progress 2011-10-27 16:37:24,737 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Scheduled 0 outputs (0 slow hosts and1049 dup hosts) 2011-10-27 16:38:24,738 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Need another 1061 map output(s) where 12 is already in progress 2011-10-27 16:38:24,738 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Scheduled 0 outputs (0 slow hosts and1049 dup hosts) 2011-10-27 16:39:24,739 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Need another 1061 map output(s) where 12 is already in progress 2011-10-27 16:39:24,739 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Scheduled 0 outputs (0 slow hosts and1049 dup hosts) 2011-10-27 16:40:24,740 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Need another 1061 map output(s) where 12 is already in progress 2011-10-27 16:40:24,740 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Scheduled 0 outputs (0 slow hosts and1049 dup hosts) ... 2011-10-27 21:55:25,070 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Need another 1061 map output(s) where 12 is already in progress 2011-10-27 21:55:25,070 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Scheduled 0 outputs (0 slow hosts and1049 dup hosts) 2011-10-27 21:56:25,071 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Need another 1061 map output(s) where 12 is already in progress 2011-10-27 21:56:25,071 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Scheduled 0 outputs (0 slow hosts and1049 dup hosts) 2011-10-27 21:57:25,072 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201110201507_0061_r_000007_1 Need another 1061 map output(s) where 12 is already in progress ------------------------------