[ 
https://issues.apache.org/jira/browse/HADOOP-13712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162390#comment-16162390
 ] 

Steven Rand commented on HADOOP-13712:
--------------------------------------

[~ste...@apache.org], I'm wondering whether it would be reasonable to add a new 
method to S3AFileSystem which is similar to {{open()}}, except that:

* The caller is responsible for providing the length of the file.
* The caller accepts that not all guarantees of {{FileSystem.open}} apply, 
i.e., we won't raise an FNFE if the file doesn't exist.
* We don't call {{getFileStatus}}, and instead just use the given length when 
constructing the S3AInputStream.

That way most callers can continue to call S3AFileSystem.open (and won't be 
affected), while callers who already know the length of the file and are okay 
with the weaker guarantees can use the new method and skip the getFileStatus 
call. The use case I have in mind is applications that already make a call to 
an external metastore/catalog type thing before trying to read a file, and get 
the info about its length and existence from there.

Do you think this would be a reasonable addition? If so I'm happy to submit a 
patch.

> S3A open to avoid needless HEAD on the successful execution path
> ----------------------------------------------------------------
>
>                 Key: HADOOP-13712
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13712
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.7.3
>            Reporter: Steve Loughran
>
> S3A's open() operation does a {{getFileStatus()}} check to see if a file is 
> not a directory before opening with a GET. That initial check will take up at 
> least one HEAD request if the file is present, more if it isn't.
> As the GET itself performs the existence check, it is needless. A successful 
> GET of a path which doesn't end in "/" means a file was there. The only 
> reason a getFileStatus call is needed is to choose which error message to 
> display if the path isn't there: is it an FNFE or is it path-is-directory.
> Proposed: reorder the code to do the GET; only if that fails fallback to 
> getFileStatus()



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to