[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963976#comment-13963976
 ] 

Steve Loughran commented on HDFS-6143:
--------------------------------------

Daryn,

Having spent time looking at traces of swift FS operations, the combination of 
Open+seek is ubiquitous, and it is expensive over long-distance links, 
especially with HTTP in the story.

But: we do expect {{open(path)}} to fail if its not there -changing that is a 
major change in expectations.

What would make sense -long term- is for a new operation  {{openAt(Path, 
offset)}}. For any of the HTTP filesystems, this would do a GET from the offset 
at open time; 

Short term, looking at the {{ByteRangeInputStream}}, it's inefficient in that 
for even a single byte forward seek (seek(getPos()+1), it closes the connection 
and re-opens it, adds the cost of setting up the connection and resets all flow 
control data on the channel. If you look a {{SwiftNativeInputStream}} you can 
see how it does read-ahead for short range seeks, which is a lot more efficient 
for any code that is reading and skipping ahead. Someone should think about 
doing that as it would reduce the performance of those seeks.

> WebHdfsFileSystem open should throw FileNotFoundException for non-existing 
> paths
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-6143
>                 URL: https://issues.apache.org/jira/browse/HDFS-6143
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Gera Shegalov
>            Assignee: Gera Shegalov
>            Priority: Blocker
>             Fix For: 2.5.0
>
>         Attachments: HDFS-6143-branch-2.4.0.v01.patch, 
> HDFS-6143-trunk-after-HDFS-5570.v01.patch, 
> HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, 
> HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, 
> HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch
>
>
> WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles 
> non-existing paths. 
> - 'open', does not really open anything, i.e., it does not contact the 
> server, and therefore cannot discover FileNotFound, it's deferred until next 
> read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
> get ENOENT on open. 
> [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
>  is an example of the code that's broken because of this.
> - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
> instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to