[ 
https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759428#comment-13759428
 ] 

Owen O'Malley commented on HDFS-4953:
-------------------------------------

This API seems overly complicated:
* Users want a single interface to read their files, regardless of whether it 
is zero copy, local, or remote.
* Because the base class FSDataInputStream implements the marker class, all 
filesystems will have the marker class regardless of whether they support zero 
copy.
* The various set/get methods are confusing as to who is supposed to set and 
who is supposed to get.

Therefore, I'd propose a follow up jira where the API change is extend 
FSDataInputStream with:
  ByteBuffer readByteBuffer(int length) throws IOException;

Relative to the current API:
* It is always a partial read. Obviously, zero copy
* It is supported for all filesystems and the default implementation fills a 
byte buffer from the underlying stream to the desired length.
* It never returns a byte buffer with remaining == 0 except at end of file.
* The returned ByteBuffer is only guaranteed to be valid until the next read on 
the same stream. It will be reused by the next readByteBuffer.

This lets us change the readers to readByteBuffer and take advantage of zero 
copy if it is available without making two completely different code paths and 
switching between them when we get an exception.

Thoughts?
                
> enable HDFS local reads via mmap
> --------------------------------
>
>                 Key: HDFS-4953
>                 URL: https://issues.apache.org/jira/browse/HDFS-4953
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>    Affects Versions: 2.3.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>             Fix For: HDFS-4949
>
>         Attachments: benchmark.png, HDFS-4953.001.patch, HDFS-4953.002.patch, 
> HDFS-4953.003.patch, HDFS-4953.004.patch, HDFS-4953.005.patch, 
> HDFS-4953.006.patch, HDFS-4953.007.patch, HDFS-4953.008.patch
>
>
> Currently, the short-circuit local read pathway allows HDFS clients to access 
> files directly without going through the DataNode.  However, all of these 
> reads involve a copy at the operating system level, since they rely on the 
> read() / pread() / etc family of kernel interfaces.
> We would like to enable HDFS to read local files via mmap.  This would enable 
> truly zero-copy reads.
> In the initial implementation, zero-copy reads will only be performed when 
> checksums were disabled.  Later, we can use the DataNode's cache awareness to 
> only perform zero-copy reads when we know that checksum has already been 
> verified.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to