[ 
https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760761#comment-13760761
 ] 

Colin Patrick McCabe commented on HDFS-4953:
--------------------------------------------

Your proposed API doesn't address one of the big asks we had when designing 
ZCR, which is to provide a mechanism for notifying the user that he cannot get 
an mmap.  As I mentioned earlier, for performance reasons, many users who might 
like to have access to a 128 MB mmap segment do not want to copy into a 128MB 
backing buffer.  Doing such a large copy would blow the L2   cache (and 
possibly the page cache), and rather than improving performance, might degrade 
it.  Similarly, users don't want to get multiple byte buffers back-- the big 
advantage of mmap is getting a single buffer back (in the cases where that's 
possible).

What if the user wants to use a direct byte buffer as his fallback buffer?  
With the current code, that is easy-- I just call 
setFallbackBuffer(ByteBuffer.allocateDirect(...)).  With your proposed API,   
there's no way to do this.

Creating a new ByteBuffer for each read is going to be slower than reusing the 
same ByteBuffer-- especially for direct ByteBuffers.  Sure, we could have some 
kind of ByteBuffer cache inside the FSDataInputStream, but that's going to be 
very complicated.  What if someone needs a ByteBuffer of size 100 but we only 
have ones of size 10 and 900 in the cache?  Do we use the big one for the small 
read or leave it around?  How long do we cache them?  Do we prefer to the 
direct ones?  And so on.  Really, the only design that makes sense is having 
the user pass in the fallback buffer.  We do not want to be re-inventing malloc 
inside FSDataInputStream.

The design principles of the current API are:
* some users want a fallback path, and some don't.  We have to satisfy both.
* we don't want to manage buffers inside FSDataInputStream.  It's a messy and 
hard problem with no optimal solutions that fit all cases.
* nobody wants to receive more than one buffer in response to a read.
* most programmers don't correctly handle short reads, so there should be an 
option to disable them.

One thing that we could and should do is provide a generic fallback path that 
is independent of filesystem.
                
> enable HDFS local reads via mmap
> --------------------------------
>
>                 Key: HDFS-4953
>                 URL: https://issues.apache.org/jira/browse/HDFS-4953
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>    Affects Versions: 2.3.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>             Fix For: HDFS-4949
>
>         Attachments: benchmark.png, HDFS-4953.001.patch, HDFS-4953.002.patch, 
> HDFS-4953.003.patch, HDFS-4953.004.patch, HDFS-4953.005.patch, 
> HDFS-4953.006.patch, HDFS-4953.007.patch, HDFS-4953.008.patch
>
>
> Currently, the short-circuit local read pathway allows HDFS clients to access 
> files directly without going through the DataNode.  However, all of these 
> reads involve a copy at the operating system level, since they rely on the 
> read() / pread() / etc family of kernel interfaces.
> We would like to enable HDFS to read local files via mmap.  This would enable 
> truly zero-copy reads.
> In the initial implementation, zero-copy reads will only be performed when 
> checksums were disabled.  Later, we can use the DataNode's cache awareness to 
> only perform zero-copy reads when we know that checksum has already been 
> verified.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to