Venkata Puneet Ravuri created HADOOP-11270:
----------------------------------------------

             Summary: Seek behavior difference between NativeS3FsInputStream 
and DFSInputStream
                 Key: HADOOP-11270
                 URL: https://issues.apache.org/jira/browse/HADOOP-11270
             Project: Hadoop Common
          Issue Type: Bug
          Components: fs/s3
            Reporter: Venkata Puneet Ravuri
            Assignee: Venkata Puneet Ravuri


There is a difference in behavior while seeking a given file present
in S3 using NativeS3FileSystem$NativeS3FsInputStream and a file present in HDFS 
using DFSInputStream.

If we seek to the end of the file incase of NativeS3FsInputStream, it fails 
with exception "java.io.EOFException: Attempted to seek or read past the end of 
the file". That is because a getObject request is issued on the S3 object with 
range start as value of length of file.

This is the complete exception stack:-
Caused by: java.io.EOFException: Attempted to seek or read past the end of the 
file
at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:462)
at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieve(Jets3tNativeFileSystemStore.java:234)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at org.apache.hadoop.fs.s3native.$Proxy17.retrieve(Unknown Source)
at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.seek(NativeS3FileSystem.java:205)
at 
org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:96)
at 
org.apache.hadoop.fs.BufferedFSInputStream.skip(BufferedFSInputStream.java:67)
at java.io.DataInputStream.skipBytes(DataInputStream.java:220)
at org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer.readFields(RCFile.java:739)
at 
org.apache.hadoop.hive.ql.io.RCFile$Reader.currentValueBuffer(RCFile.java:1720)
at org.apache.hadoop.hive.ql.io.RCFile$Reader.getCurrentRow(RCFile.java:1898)
at 
org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:149)
at 
org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:44)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339)
... 15 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to