[ https://issues.apache.org/jira/browse/TEZ-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153872#comment-14153872 ]
Rajesh Balamohan commented on TEZ-1634: --------------------------------------- lgtm. +1. > BlockCompressorStream.finish() is called twice in IFile.close leading to > Shuffle errors > --------------------------------------------------------------------------------------- > > Key: TEZ-1634 > URL: https://issues.apache.org/jira/browse/TEZ-1634 > Project: Apache Tez > Issue Type: Bug > Reporter: Rajesh Balamohan > Assignee: Rajesh Balamohan > Attachments: BlockCompressorStream.with.logging.java, > TEZ-1634.1.patch, TEZ-1634.2.patch, stacktrace-with-comments.txt > > > When IFile.Writer is closed, it explicitly calls compressedOut.finish(); And > as a part of FSDataOutputStream.close(), it again internally calls finish(). > Please refer o.a.h.i.compress.BlockCompressorStream for more details on > finish(). This leads to additional 4 bytes being written to IFile. This > causes issues randomly during shuffle. Also, this prevents IFileInputStream > to do the proper checksumming. > This error happens only when we try to fetch multiple attempt outputs using > the same URL. And is easily reproducible with SnappCompressionCodec. First > attempt output would be downloaded by fetcher and due to the last 4 bytes in > the stream, it wouldn't do the proper checksumming in IFileInputStream. This > causes the subsequent attempt download to fail with the following exception. > Example error in shuffle phase is attached below. > >>>> > 2014-09-15 09:54:22,950 WARN [fetcher [scope_41] #31] > org.apache.tez.runtime.library.common.shuffle.impl.Fetcher: Invalid map id > java.lang.IllegalArgumentException: Invalid header received: partition: 0 > at > org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyMapOutput(Fetcher.java:352) > at > org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:294) > at > org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160) > >>>> > I will attach the debug version of BlockCompressionStream with threaddump > (which validates that finish() is called twice in IFile.close()). This bug > was present in earlier versions of Tez as well, and was able to consistently > reproduce it now on local-vm itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)