[jira] [Updated] (HADOOP-11202) SequenceFile crashes with encrypted files that are shorter than FileSystem.getStatus(path)

Steve Loughran (JIRA) Tue, 14 Oct 2014 16:59:01 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-11202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Loughran updated HADOOP-11202:
------------------------------------
    Component/s: fs/s3

moved to hadoop common, tagged as an fs/s3. Corby, there's a new "s3a FS client 
that uses the AWS APIs directly" -could you try that to see if it behaves 
better?

> SequenceFile crashes with encrypted files that are shorter than 
> FileSystem.getStatus(path)
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-11202
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11202
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.2.0
>         Environment: Amazon EMR 3.0.4
>            Reporter: Corby Wilson
>
> Encrypted files are often padded to allow for proper encryption on a 2^n-bit 
> boundary.  As a result, the encrypted file might be a few bytes bigger than 
> the unencrypted file.
> We have a case where an encrypted files is 2 bytes bigger due to padding.
> When we run a HIVE job on the file to get a record count (select count(*) 
> from <table>) it runs org.apache.hadoop.mapred.SequenceFileRecordReader and 
> loads the file in through a custom FS InputStream.
> The InputStream decrypts the file  as it gets read in.  Splits are properly 
> handled as it extends both Seekable and Positioned Readable.
> When the org.apache.hadoop.io.SequenceFile class intializes it reads in the 
> file size from the FileMetadata which returns the file size of the encrypted 
> file on disk (or in this case in S3).
> However, the actual file size is 2 bytes less, so the InputStream will return 
> EOF (-1) before the SequenceFile thinks it's done.
> As a result, the SequenceFile$Reader tried to run the next->readRecordLength 
> after the file has been closed and we get a crash.
> The SequenceFile class SHOULD, instead, pay attention to the EOF marker from 
> the stream instead of the file size reported in the metadata and set the 
> 'more' flag accordingly.
> Sample stack dump from crash
> 2014-10-10 21:25:27,160 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.io.IOException: java.io.IOException: 
> java.io.EOFException
>       at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>       at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>       at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:304)
>       at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
>       at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
>       at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:433)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
> Caused by: java.io.IOException: java.io.EOFException
>       at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>       at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>       at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
>       at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
>       at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
>       at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
>       at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:302)
>       ... 11 more
> Caused by: java.io.EOFException
>       at java.io.DataInputStream.readInt(DataInputStream.java:392)
>       at 
> org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:2332)
>       at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2363)
>       at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2500)
>       at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
>       at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
>       ... 15 more
> Sample stack dump:



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11202) SequenceFile crashes with encrypted files that are shorter than FileSystem.getStatus(path)

Reply via email to