[ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963976#comment-13963976 ]
Steve Loughran commented on HDFS-6143: -------------------------------------- Daryn, Having spent time looking at traces of swift FS operations, the combination of Open+seek is ubiquitous, and it is expensive over long-distance links, especially with HTTP in the story. But: we do expect {{open(path)}} to fail if its not there -changing that is a major change in expectations. What would make sense -long term- is for a new operation {{openAt(Path, offset)}}. For any of the HTTP filesystems, this would do a GET from the offset at open time; Short term, looking at the {{ByteRangeInputStream}}, it's inefficient in that for even a single byte forward seek (seek(getPos()+1), it closes the connection and re-opens it, adds the cost of setting up the connection and resets all flow control data on the channel. If you look a {{SwiftNativeInputStream}} you can see how it does read-ahead for short range seeks, which is a lot more efficient for any code that is reading and skipping ahead. Someone should think about doing that as it would reduce the performance of those seeks. > WebHdfsFileSystem open should throw FileNotFoundException for non-existing > paths > -------------------------------------------------------------------------------- > > Key: HDFS-6143 > URL: https://issues.apache.org/jira/browse/HDFS-6143 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.3.0 > Reporter: Gera Shegalov > Assignee: Gera Shegalov > Priority: Blocker > Fix For: 2.5.0 > > Attachments: HDFS-6143-branch-2.4.0.v01.patch, > HDFS-6143-trunk-after-HDFS-5570.v01.patch, > HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, > HDFS-6143.v02.patch, HDFS-6143.v03.patch, HDFS-6143.v04.patch, > HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch > > > WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles > non-existing paths. > - 'open', does not really open anything, i.e., it does not contact the > server, and therefore cannot discover FileNotFound, it's deferred until next > read. It's counterintuitive and not how local FS or HDFS work. In POSIX you > get ENOENT on open. > [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java] > is an example of the code that's broken because of this. > - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST > instead of SC_NOT_FOUND for non-exitsing paths -- This message was sent by Atlassian JIRA (v6.2#6252)