[ 
https://issues.apache.org/jira/browse/HADOOP-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472998#comment-13472998
 ] 

Andy Isaacson commented on HADOOP-8900:
---------------------------------------

bq. It's kind of annoying to have to use 4GB of temporary space

Nope, it only writes the compressed file to disk; {{gzip -1}} compresses 4GB of 
zeros to 18 MiB.

bq. Could you please port it to branch-1 that that we could integrate it to 
branch-1-win

Slavik, thanks for the review!

I don't have very much experience on branch-1, would you like to take a shot at 
the port?  Especially I don't know very much about the test framework 
differences.  I will figure out the details and do the port later this week if 
you don't get to it first.
                
> BuiltInGzipDecompressor : java.io.IOException: stored gzip size doesn't match 
> decompressed size (Slavik Krassovsky)
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8900
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8900
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 1-win, 2.0.1-alpha
>         Environment: Encountered failure when processing large GZIP file
>            Reporter: Slavik Krassovsky
>            Assignee: Andy Isaacson
>         Attachments: BuiltInGzipDecompressor2.patch, hadoop8900-2.txt, 
> hadoop8900.txt
>
>
> Encountered failure when processing large GZIP file
> • Gz: Failed in 1hrs, 13mins, 57sec with the error:
>  ¸java.io.IOException: IO error in map input file 
> hdfs://localhost:9000/Halo4/json_m/gz/NewFileCat.txt.gz
>  at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:242)
>  at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216)
>  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
>  at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>  at org.apache.hadoop.mapred.Child.main(Child.java:260)
>  Caused by: java.io.IOException: stored gzip size doesn't match decompressed 
> size
>  at 
> org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.executeTrailerState(BuiltInGzipDecompressor.java:389)
>  at 
> org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.decompress(BuiltInGzipDecompressor.java:224)
>  at 
> org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
>  at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
>  at java.io.InputStream.read(InputStream.java:102)
>  at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
>  at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:136)
>  at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:40)
>  at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:66)
>  at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:32)
>  at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:67)
>  at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:236)
>  ... 9 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to