[ https://issues.apache.org/jira/browse/HADOOP-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471908#comment-13471908 ]
Slavik Krassovsky commented on HADOOP-8900: ------------------------------------------- It’s a quirk of Java to treat masks of integer values as integer types even when applying to a long value – legitimate, yet somewhat imposing. /** @Hadoo Gzip issue repro •@author viatk */ public class Repro { public static void main(String[] args) { long smallLongValue = 665615408L; //0x027AC7C30 long largeLongValue = 9255550000L; //0x227AC7C30 long largeValueWithIntMask = (largeLongValue & 0xffffffff); long largeValueWithLongMask = (largeLongValue & 0xffffffffL); System.out.println("smallLongValue= "+smallLongValue); System.out.println("largeLongValue= "+largeLongValue); System.out.println("largeValueWithIntMask ="+largeValueWithIntMask); System.out.println("largeValueWithLongMask ="+largeValueWithLongMask); System.out.println(); if (largeValueWithIntMask != largeValueWithLongMask) { System.out.println("Here is your repro - largeValueWithIntMask != largeValueWithLongMask"); } if (smallLongValue != largeValueWithIntMask) { System.out.println("Thus smallLongValue != largeValueWithIntMask"); } if (smallLongValue == largeValueWithLongMask) { System.out.println("The fix is to compare Long values with long values with long masks."); } } } smallLongValue= 665615408 largeLongValue= 9255550000 largeValueWithIntMask =9255550000 largeValueWithLongMask =665615408 Here is your repro - largeValueWithIntMask != largeValueWithLongMask Thus smallLongValue != largeValueWithIntMask The fix is to compare Long values with long values with long masks. . Chuan Liu added a comment - 24/Aug/12 11:32 AM - edited +1 We found this bug while working with an internal customer. The bug exists on Linux as well. The root cause is we are comparing long values with a int mask. > BuiltInGzipDecompressor : java.io.IOException: stored gzip size doesn't match > decompressed size (Slavik Krassovsky) > ------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-8900 > URL: https://issues.apache.org/jira/browse/HADOOP-8900 > Project: Hadoop Common > Issue Type: Bug > Environment: Encountered failure when processing large GZIP file > Reporter: Slavik Krassovsky > > Encountered failure when processing large GZIP file > • Gz: Failed in 1hrs, 13mins, 57sec with the error: > ¸java.io.IOException: IO error in map input file > hdfs://localhost:9000/Halo4/json_m/gz/NewFileCat.txt.gz > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:242) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371) > at org.apache.hadoop.mapred.Child$4.run(Child.java:266) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at org.apache.hadoop.mapred.Child.main(Child.java:260) > Caused by: java.io.IOException: stored gzip size doesn't match decompressed > size > at > org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.executeTrailerState(BuiltInGzipDecompressor.java:389) > at > org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.decompress(BuiltInGzipDecompressor.java:224) > at > org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76) > at java.io.InputStream.read(InputStream.java:102) > at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134) > at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:136) > at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:40) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:66) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:32) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:67) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:236) > ... 9 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira