[ 
https://issues.apache.org/jira/browse/HADOOP-16109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16788739#comment-16788739
 ] 

Hudson commented on HADOOP-16109:
---------------------------------

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16166 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16166/])
HADOOP-16109. Parquet reading S3AFileSystem causes EOF (stevel: rev 
0cbe9ad8c23d3fd2ac8ac423b6d2df68de720981)
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractSeek.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractSeekTest.java


> Parquet reading S3AFileSystem causes EOF
> ----------------------------------------
>
>                 Key: HADOOP-16109
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16109
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.9.2, 2.8.5, 3.3.0, 3.1.2
>            Reporter: Dave Christianson
>            Assignee: Steve Loughran
>            Priority: Blocker
>             Fix For: 3.3.0
>
>
> When using S3AFileSystem to read Parquet files a specific set of 
> circumstances causes an  EOFException that is not thrown when reading the 
> same file from local disk
> Note this has only been observed under specific circumstances:
>   - when the reader is doing a projection (will cause it to do a seek 
> backwards and put the filesystem into random mode)
>  - when the file is larger than the readahead buffer size
>  - when the seek behavior of the Parquet reader causes the reader to seek 
> towards the end of the current input stream without reopening, such that the 
> next read on the currently open stream will read past the end of the 
> currently open stream.
> Exception from Parquet reader is as follows:
> {code}
> Caused by: java.io.EOFException: Reached the end of stream with 51 bytes left 
> to read
>  at 
> org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:104)
>  at 
> org.apache.parquet.io.DelegatingSeekableInputStream.readFullyHeapBuffer(DelegatingSeekableInputStream.java:127)
>  at 
> org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:91)
>  at 
> org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:1174)
>  at 
> org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:805)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:127)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
>  at 
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
>  at 
> org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormatBase.fetchNext(HadoopInputFormatBase.java:206)
>  at 
> org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormatBase.reachedEnd(HadoopInputFormatBase.java:199)
>  at 
> org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:190)
>  at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The following example program generate the same root behavior (sans finding a 
> Parquet file that happens to trigger this condition) by purposely reading 
> past the already active readahead range on any file >= 1029 bytes in size.. 
> {code:java}
> final Configuration conf = new Configuration();
> conf.set("fs.s3a.readahead.range", "1K");
> conf.set("fs.s3a.experimental.input.fadvise", "random");
> final FileSystem fs = FileSystem.get(path.toUri(), conf);
> // forward seek reading across readahead boundary
> try (FSDataInputStream in = fs.open(path)) {
>     final byte[] temp = new byte[5];
>     in.readByte();
>     in.readFully(1023, temp); // <-- works
> }
> // forward seek reading from end of readahead boundary
> try (FSDataInputStream in = fs.open(path)) {
>  final byte[] temp = new byte[5];
>  in.readByte();
>  in.readFully(1024, temp); // <-- throws EOFException
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to