[ https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759764#comment-13759764 ]
Colin Patrick McCabe commented on HDFS-4953: -------------------------------------------- The current API is generic and not HDFS-specific. You get a zero-copy cursor from {{FSDataInputStream#createZeroCopyCursor}}, and you read from it with {{ZeroCopyCursor#read}}. Then, when you're done, you close it with {{ZeroCopyCursor#close}}. It also supports a fallback path. In order to have a fallback path, you must call {{ZeroCopyCursor#setFallbackBuffer}}. That provides the cursor with a fallback buffer which will be used when an mmap is unavailable. The big problem with "well, just return a ByteBuffer, then!" is that ByteBuffer has no close method. So it's unclear how the mmap would ever be released. It is not adequate to rely on the GC, since we are talking about file descriptors here. Furthermore, there are a lot of applications where "valid until next read() call or close of stream" is not good enough. Sometimes people want to do multiple reads and look at the results for each, and we should accommodate them. Many prospective users of zero-copy are not interested in dealing with many small buffers. They want to deal with either a single big contiguous mmap'ed memory area, or just do reads the standard way, performing many small reads that only access as much as they need. The ability to turn off "fallback mode" (where we fall back to copying to service your read) was very specifically added in response to these users. I think any reasonable design will end up looking a lot like what we already did in this JIRA. I suppose instead of separating {{createZeroCopyCursor}} and {{read}}, we could have combined them, but that would have resulted in a function call with a lot more parameters. The current design also prevents the scenario where more and more function variants get added over time with more and more parameters-- the kind of function overload hell we landed in with {{FileSystem#create}}. > enable HDFS local reads via mmap > -------------------------------- > > Key: HDFS-4953 > URL: https://issues.apache.org/jira/browse/HDFS-4953 > Project: Hadoop HDFS > Issue Type: New Feature > Affects Versions: 2.3.0 > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Fix For: HDFS-4949 > > Attachments: benchmark.png, HDFS-4953.001.patch, HDFS-4953.002.patch, > HDFS-4953.003.patch, HDFS-4953.004.patch, HDFS-4953.005.patch, > HDFS-4953.006.patch, HDFS-4953.007.patch, HDFS-4953.008.patch > > > Currently, the short-circuit local read pathway allows HDFS clients to access > files directly without going through the DataNode. However, all of these > reads involve a copy at the operating system level, since they rely on the > read() / pread() / etc family of kernel interfaces. > We would like to enable HDFS to read local files via mmap. This would enable > truly zero-copy reads. > In the initial implementation, zero-copy reads will only be performed when > checksums were disabled. Later, we can use the DataNode's cache awareness to > only perform zero-copy reads when we know that checksum has already been > verified. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira