Hi, How many reducers are you using currently? Try increasing the number or reducers. Let me know if it helps.
On Wed, Feb 16, 2011 at 8:30 PM, Kelly Burkhart <kelly.burkh...@gmail.com>wrote: > Hello, I'm seeing frequent fails in reduce jobs with errors similar to > this: > > > 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask: > header: attempt_201102081823_0175_m_002153_0, compressed len: 172492, > decompressed len: 172488 > 2011-02-15 15:21:10,163 FATAL org.apache.hadoop.mapred.TaskRunner: > attempt_201102081823_0175_r_000034_0 : Map output copy failure : > java.lang.OutOfMemoryError: Java heap space > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195) > > 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask: > Shuffling 172488 bytes (172492 raw bytes) into RAM from > attempt_201102081823_0175_m_002153_0 > 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask: > header: attempt_201102081823_0175_m_002118_0, compressed len: 161944, > decompressed len: 161940 > 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask: > header: attempt_201102081823_0175_m_001704_0, compressed len: 228365, > decompressed len: 228361 > 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask: Task > attempt_201102081823_0175_r_000034_0: Failed fetch #1 from > attempt_201102081823_0175_m_002153_0 > 2011-02-15 15:21:10,424 FATAL org.apache.hadoop.mapred.TaskRunner: > attempt_201102081823_0175_r_000034_0 : Map output copy failure : > java.lang.OutOfMemoryError: Java heap space > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195) > > Some also show this: > > Error: java.lang.OutOfMemoryError: GC overhead limit exceeded > at sun.net.www.http.ChunkedInputStream.(ChunkedInputStream.java:63) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:811) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1072) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1447) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1349) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195) > > The particular job I'm running is an attempt to merge multiple time > series files into a single file. The job tracker shows the following: > > > Kind Num Tasks Complete Killed Failed/Killed Task Attempts > map 15795 15795 0 0 / 29 > reduce 100 30 70 17 / 29 > > All of the files I'm reading have records with a timestamp key similar to: > > 2011-01-03 08:30:00.457000<tab><record> > > My map job is a simple python program that ignores rows with times < > 08:30:00 and > 15:00:00, determines the type of input row and writes > it to stdout with very minor modification. It maintains no state and > should not use any significant memory. My reducer is the > IdentityReducer. The input files are individually gzipped then put > into hdfs. The total uncompressed size of the output should be around > 150G. Our cluster is 32 nodes each of which has 16G RAM and most of > which have two 2T drives. We're running hadoop 0.20.2. > > > Can anyone provide some insight on how we can eliminate this issue? > I'm certain this email does not provide enough info, please let me > know what further information is needed to troubleshoot. > > Thanks in advance, > > -Kelly > -- Regards, R.V.