[ https://issues.apache.org/jira/browse/HADOOP-11202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran updated HADOOP-11202: ------------------------------------ Component/s: fs/s3 moved to hadoop common, tagged as an fs/s3. Corby, there's a new "s3a FS client that uses the AWS APIs directly" -could you try that to see if it behaves better? > SequenceFile crashes with encrypted files that are shorter than > FileSystem.getStatus(path) > ------------------------------------------------------------------------------------------ > > Key: HADOOP-11202 > URL: https://issues.apache.org/jira/browse/HADOOP-11202 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 > Affects Versions: 2.2.0 > Environment: Amazon EMR 3.0.4 > Reporter: Corby Wilson > > Encrypted files are often padded to allow for proper encryption on a 2^n-bit > boundary. As a result, the encrypted file might be a few bytes bigger than > the unencrypted file. > We have a case where an encrypted files is 2 bytes bigger due to padding. > When we run a HIVE job on the file to get a record count (select count(*) > from <table>) it runs org.apache.hadoop.mapred.SequenceFileRecordReader and > loads the file in through a custom FS InputStream. > The InputStream decrypts the file as it gets read in. Splits are properly > handled as it extends both Seekable and Positioned Readable. > When the org.apache.hadoop.io.SequenceFile class intializes it reads in the > file size from the FileMetadata which returns the file size of the encrypted > file on disk (or in this case in S3). > However, the actual file size is 2 bytes less, so the InputStream will return > EOF (-1) before the SequenceFile thinks it's done. > As a result, the SequenceFile$Reader tried to run the next->readRecordLength > after the file has been closed and we get a crash. > The SequenceFile class SHOULD, instead, pay attention to the EOF marker from > the stream instead of the file size reported in the metadata and set the > 'more' flag accordingly. > Sample stack dump from crash > 2014-10-10 21:25:27,160 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.io.IOException: java.io.IOException: > java.io.EOFException > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:304) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:433) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) > Caused by: java.io.IOException: java.io.EOFException > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:302) > ... 11 more > Caused by: java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:392) > at > org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:2332) > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2363) > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2500) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274) > ... 15 more > Sample stack dump: -- This message was sent by Atlassian JIRA (v6.3.4#6332)