[ https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763167#comment-13763167 ]
Owen O'Malley commented on HDFS-4953: ------------------------------------- Thanks, Colin, for giving more details of the design. Your new API is much better, but a few issues remain: * If an application needs to determine whether zero copy is available, it should be able to do so without catching exceptions. * What happens if the user reads across a block boundary? Most applications don't care about block boundaries and shouldn't have to add special code to cut their requests to block boundaries. That will impose inefficiencies. * The cost of a second level of indirection (app -> ZeroCopy -> ByteBuffer) in the inner loop of the client seems prohibitive. * Requiring pre-allocation of a fallback buffer that hopefully is never needed is really problematic. I'd propose that we flip this around to a factory. * You either need to support short reads or return multiple bytebuffers. I don't see a way to avoid both unless applications are forced to never read across block boundaries. That would be much worse than either of the other options. I'd prefer to have multiple ByteBuffers returned, but if you hate that worse than short reads, I can handle that. * It isn't clear to me how you plan to release mmapped buffers, since Java doesn't provide an API to do that. If you have a mechanism to do that, we need a releaseByteBuffer(ByteBuffer buffer) to release it. I'd propose that we add the following to FSDataInputStream: {code} /** * Is the current location of the stream available via zero copy? */ public boolean isZeroCopyAvailable(); /** * Read from the current location at least 1 and up to maxLength bytes. In most situations, the returned * buffer will contain maxLength bytes unless either: * * the read crosses a block boundary and zero copy is being used * * the stream has fewer than maxLength bytes left * The returned buffer will either be one that was created by the factory or a MappedByteBuffer. */ public ByteBuffer readByteBuffer(ByteBufferFactory factory, int maxLength) throws IOException; /** * Allow application to manage how ByteBuffers are created for fallback buffers. */ public interface ByteBufferFactory { ByteBuffer createBuffer(int capacity); } {code} > enable HDFS local reads via mmap > -------------------------------- > > Key: HDFS-4953 > URL: https://issues.apache.org/jira/browse/HDFS-4953 > Project: Hadoop HDFS > Issue Type: New Feature > Affects Versions: 2.3.0 > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Fix For: HDFS-4949 > > Attachments: benchmark.png, HDFS-4953.001.patch, HDFS-4953.002.patch, > HDFS-4953.003.patch, HDFS-4953.004.patch, HDFS-4953.005.patch, > HDFS-4953.006.patch, HDFS-4953.007.patch, HDFS-4953.008.patch > > > Currently, the short-circuit local read pathway allows HDFS clients to access > files directly without going through the DataNode. However, all of these > reads involve a copy at the operating system level, since they rely on the > read() / pread() / etc family of kernel interfaces. > We would like to enable HDFS to read local files via mmap. This would enable > truly zero-copy reads. > In the initial implementation, zero-copy reads will only be performed when > checksums were disabled. Later, we can use the DataNode's cache awareness to > only perform zero-copy reads when we know that checksum has already been > verified. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira