[ https://issues.apache.org/jira/browse/HADOOP-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472998#comment-13472998 ]
Andy Isaacson commented on HADOOP-8900: --------------------------------------- bq. It's kind of annoying to have to use 4GB of temporary space Nope, it only writes the compressed file to disk; {{gzip -1}} compresses 4GB of zeros to 18 MiB. bq. Could you please port it to branch-1 that that we could integrate it to branch-1-win Slavik, thanks for the review! I don't have very much experience on branch-1, would you like to take a shot at the port? Especially I don't know very much about the test framework differences. I will figure out the details and do the port later this week if you don't get to it first. > BuiltInGzipDecompressor : java.io.IOException: stored gzip size doesn't match > decompressed size (Slavik Krassovsky) > ------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-8900 > URL: https://issues.apache.org/jira/browse/HADOOP-8900 > Project: Hadoop Common > Issue Type: Bug > Affects Versions: 1-win, 2.0.1-alpha > Environment: Encountered failure when processing large GZIP file > Reporter: Slavik Krassovsky > Assignee: Andy Isaacson > Attachments: BuiltInGzipDecompressor2.patch, hadoop8900-2.txt, > hadoop8900.txt > > > Encountered failure when processing large GZIP file > • Gz: Failed in 1hrs, 13mins, 57sec with the error: > ¸java.io.IOException: IO error in map input file > hdfs://localhost:9000/Halo4/json_m/gz/NewFileCat.txt.gz > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:242) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371) > at org.apache.hadoop.mapred.Child$4.run(Child.java:266) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at org.apache.hadoop.mapred.Child.main(Child.java:260) > Caused by: java.io.IOException: stored gzip size doesn't match decompressed > size > at > org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.executeTrailerState(BuiltInGzipDecompressor.java:389) > at > org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.decompress(BuiltInGzipDecompressor.java:224) > at > org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76) > at java.io.InputStream.read(InputStream.java:102) > at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134) > at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:136) > at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:40) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:66) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:32) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:67) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:236) > ... 9 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira