Hi,
How many reducers are you using currently?
Try increasing the number or reducers.
Let me know if it helps.

On Wed, Feb 16, 2011 at 8:30 PM, Kelly Burkhart <kelly.burkh...@gmail.com>wrote:

> Hello, I'm seeing frequent fails in reduce jobs with errors similar to
> this:
>
>
> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask:
> header: attempt_201102081823_0175_m_002153_0, compressed len: 172492,
> decompressed len: 172488
> 2011-02-15 15:21:10,163 FATAL org.apache.hadoop.mapred.TaskRunner:
> attempt_201102081823_0175_r_000034_0 : Map output copy failure :
> java.lang.OutOfMemoryError: Java heap space
>        at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>        at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>        at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
>        at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
>
> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask:
> Shuffling 172488 bytes (172492 raw bytes) into RAM from
> attempt_201102081823_0175_m_002153_0
> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask:
> header: attempt_201102081823_0175_m_002118_0, compressed len: 161944,
> decompressed len: 161940
> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask:
> header: attempt_201102081823_0175_m_001704_0, compressed len: 228365,
> decompressed len: 228361
> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask: Task
> attempt_201102081823_0175_r_000034_0: Failed fetch #1 from
> attempt_201102081823_0175_m_002153_0
> 2011-02-15 15:21:10,424 FATAL org.apache.hadoop.mapred.TaskRunner:
> attempt_201102081823_0175_r_000034_0 : Map output copy failure :
> java.lang.OutOfMemoryError: Java heap space
>        at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>        at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>        at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
>        at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
>
> Some also show this:
>
> Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
>        at sun.net.www.http.ChunkedInputStream.(ChunkedInputStream.java:63)
>        at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:811)
>        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
>        at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1072)
>        at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1447)
>        at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1349)
>        at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
>        at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
>
> The particular job I'm running is an attempt to merge multiple time
> series files into a single file.  The job tracker shows the following:
>
>
> Kind    Num Tasks    Complete   Killed    Failed/Killed Task Attempts
> map     15795        15795      0         0 / 29
> reduce  100          30         70        17 / 29
>
> All of the files I'm reading have records with a timestamp key similar to:
>
> 2011-01-03 08:30:00.457000<tab><record>
>
> My map job is a simple python program that ignores rows with times <
> 08:30:00 and > 15:00:00, determines the type of input row and writes
> it to stdout with very minor modification.  It maintains no state and
> should not use any significant memory.  My reducer is the
> IdentityReducer.  The input files are individually gzipped then put
> into hdfs.  The total uncompressed size of the output should be around
> 150G.  Our cluster is 32 nodes each of which has 16G RAM and most of
> which have two 2T drives.  We're running hadoop 0.20.2.
>
>
> Can anyone provide some insight on how we can eliminate this issue?
> I'm certain this email does not provide enough info, please let me
> know what further information is needed to troubleshoot.
>
> Thanks in advance,
>
> -Kelly
>



-- 
Regards,
R.V.

Reply via email to