[jira] [Updated] (HADOOP-11334) Mapreduce Job Failed due to failure fetching mapper output on the reduce side

Jinghui Wang (JIRA) Tue, 25 Nov 2014 17:41:37 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-11334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jinghui Wang updated HADOOP-11334:
----------------------------------
    Description: 
Running terasort with the following options hadoop jar 
hadoop-mapreduce-examples.jar terasort *-Dio.native.lib.available=false 
-Dmapreduce.map.output.compress=true 
-Dmapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.GzipCodec*  
/tmp/tera-in /tmp/tera-out

The job failed with the reducer failed to fetching the output from mappers (see 
the following stacktrace). The problem is that in JIRA MAPREDUCE-1784, it added 
support to handle null compressors to default to non-compressed output. In this 
case, when the *io.native.lib.available* is set to true, the compressor will be 
null. However, the decompressor has a Java implementation, so when the reducer 
tries to read the mapper output, it uses the decompressor, but the output does 
not have the Gzip header.


2014-11-25 10:39:48,108 WARN [fetcher#9] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to shuffle output of 
attempt_1416875111322_0005_m_000002_0 from bdvs130:13562
java.io.IOException: not a gzip file
        at 
org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.processBasicHeader(BuiltInGzipDecompressor.java:495)
        at 
org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.executeHeaderState(BuiltInGzipDecompressor.java:256)
        at 
org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.decompress(BuiltInGzipDecompressor.java:185)
        at 
org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
        at 
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
        at 
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
        at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:434)
        at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:341)
        at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)

  was:
Running terasort with the following options hadoop jar 
hadoop-mapreduce-examples.jar terasort *-Dio.native.lib.available=false 
-Dmapreduce.map.output.compress=true 
-Dmapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.GzipCodec*  
/tmp/tera-in /tmp/tera-out

The job failed with the reducer failed to fetching the output from mappers (see 
the following stacktrace). The problem is that in JIRA MAPREDUCE-1784, it added 
support to handle null compressors to default to non-compressed output. In this 
case, when the *io.native.lib.available* is set to true, the compressor will be 
null. However, the decompressor has a Java implementation, so when the reducer 
tries to read the mapper output, it uses the decompressor, but the output does 
not have the Gzip header.


2014-11-25 10:39:48,108 WARN [fetcher#9] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to shuffle output of 
attempt_1416875111322_0005_m_000002_0 from bdvs130.svl.ibm.com:13562
java.io.IOException: not a gzip file
        at 
org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.processBasicHeader(BuiltInGzipDecompressor.java:495)
        at 
org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.executeHeaderState(BuiltInGzipDecompressor.java:256)
        at 
org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.decompress(BuiltInGzipDecompressor.java:185)
        at 
org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
        at 
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
        at 
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
        at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:434)
        at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:341)
        at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)


> Mapreduce Job Failed due to failure fetching mapper output on the reduce side
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-11334
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11334
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 2.4.1
>            Reporter: Jinghui Wang
>
> Running terasort with the following options hadoop jar 
> hadoop-mapreduce-examples.jar terasort *-Dio.native.lib.available=false 
> -Dmapreduce.map.output.compress=true 
> -Dmapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.GzipCodec*
>   /tmp/tera-in /tmp/tera-out
> The job failed with the reducer failed to fetching the output from mappers 
> (see the following stacktrace). The problem is that in JIRA MAPREDUCE-1784, 
> it added support to handle null compressors to default to non-compressed 
> output. In this case, when the *io.native.lib.available* is set to true, the 
> compressor will be null. However, the decompressor has a Java implementation, 
> so when the reducer tries to read the mapper output, it uses the 
> decompressor, but the output does not have the Gzip header.
> 2014-11-25 10:39:48,108 WARN [fetcher#9] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to shuffle output of 
> attempt_1416875111322_0005_m_000002_0 from bdvs130:13562
> java.io.IOException: not a gzip file
>       at 
> org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.processBasicHeader(BuiltInGzipDecompressor.java:495)
>       at 
> org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.executeHeaderState(BuiltInGzipDecompressor.java:256)
>       at 
> org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.decompress(BuiltInGzipDecompressor.java:185)
>       at 
> org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
>       at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
>       at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
>       at 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
>       at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:434)
>       at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:341)
>       at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11334) Mapreduce Job Failed due to failure fetching mapper output on the reduce side

Reply via email to