[ https://issues.apache.org/jira/browse/TEZ-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172282#comment-14172282 ]
Rajesh Balamohan commented on TEZ-1634: --------------------------------------- [~sseth] - True that Gzip overwrites close(). However, this would not be a problem in IFile. In IFile.Writer we explicitly invoke {code} this.compressor.reset(); this.compressedOut = codec.createOutputStream(checksumOut, compressor); {code} This internally calls the following in GzipCodec {code} @Override public CompressionOutputStream createOutputStream(OutputStream out, Compressor compressor) throws IOException { return (compressor != null) ? new CompressorStream(out, compressor, conf.getInt("io.file.buffer.size", 4*1024)) : createOutputStream(out); } {code} So essentially, it ends up creating a CompressorStream; And CompressorStream::close() automatically invokes finish() + close(). > BlockCompressorStream.finish() is called twice in IFile.close leading to > Shuffle errors > --------------------------------------------------------------------------------------- > > Key: TEZ-1634 > URL: https://issues.apache.org/jira/browse/TEZ-1634 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.5.0, 0.6.0 > Reporter: Rajesh Balamohan > Assignee: Rajesh Balamohan > Fix For: 0.6.0 > > Attachments: BlockCompressorStream.with.logging.java, > TEZ-1634.1.patch, TEZ-1634.2.patch, stacktrace-with-comments.txt > > > When IFile.Writer is closed, it explicitly calls compressedOut.finish(); And > as a part of FSDataOutputStream.close(), it again internally calls finish(). > Please refer o.a.h.i.compress.BlockCompressorStream for more details on > finish(). This leads to additional 4 bytes being written to IFile. This > causes issues randomly during shuffle. Also, this prevents IFileInputStream > to do the proper checksumming. > This error happens only when we try to fetch multiple attempt outputs using > the same URL. And is easily reproducible with SnappCompressionCodec. First > attempt output would be downloaded by fetcher and due to the last 4 bytes in > the stream, it wouldn't do the proper checksumming in IFileInputStream. This > causes the subsequent attempt download to fail with the following exception. > Example error in shuffle phase is attached below. > >>>> > 2014-09-15 09:54:22,950 WARN [fetcher [scope_41] #31] > org.apache.tez.runtime.library.common.shuffle.impl.Fetcher: Invalid map id > java.lang.IllegalArgumentException: Invalid header received: partition: 0 > at > org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyMapOutput(Fetcher.java:352) > at > org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:294) > at > org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160) > >>>> > I will attach the debug version of BlockCompressionStream with threaddump > (which validates that finish() is called twice in IFile.close()). This bug > was present in earlier versions of Tez as well, and was able to consistently > reproduce it now on local-vm itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)