[ 
https://issues.apache.org/jira/browse/HADOOP-12444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15224228#comment-15224228
 ] 

Steve Loughran commented on HADOOP-12444:
-----------------------------------------

OK. I'd like you assume that HADOOP-12994 is in, and do a patch on top of it. 

# it explicitly adds tests for read and readfully
# it's got a standard validation of read/readfully args, with standard 
exceptions and messages, for consistency everywhere...no need to paste in from 
elsewhere
# it's S3AInputStream's reading of the underlying http input streams explicitly 
handle EOF exceptions coming in from HTTP. This turns out to matter, as for 
HDFS, reading past the EOF isn't raised as an EOF, it's returned as a -1, for 
the caller to interpret.

I know the patch isn't in *yet*, but it is a prerequisite for validating this 
patch, so you will have to code off it, I'm afraid.



> Consider implementing lazy seek in S3AInputStream
> -------------------------------------------------
>
>                 Key: HADOOP-12444
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12444
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.7.1
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: HADOOP-12444-004.patch, HADOOP-12444-005.patch, 
> HADOOP-12444.1.patch, HADOOP-12444.2.patch, HADOOP-12444.3.patch, 
> HADOOP-12444.WIP.patch, hadoop-aws-test-reports.tar.gz
>
>
> - Currently, "read(long position, byte[] buffer, int offset, int length)" is 
> not implemented in S3AInputStream (unlike DFSInputStream). So, 
> "readFully(long position, byte[] buffer, int offset, int length)" in 
> S3AInputStream goes through the default implementation of seek(), read(), 
> seek() in FSInputStream. 
> - However, seek() in S3AInputStream involves re-opening of connection to S3 
> everytime 
> (https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L115).
>   
> - It would be good to consider having a lazy seek implementation to reduce 
> connection overheads to S3. (e.g Presto implements lazy seek. 
> https://github.com/facebook/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/PrestoS3FileSystem.java#L623)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to