[ https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740551#comment-13740551 ]
Colin Patrick McCabe commented on HDFS-4953: -------------------------------------------- bq. brandon wrote: 1. DFSClient: looks like all the DFSClient instances share the same ClientMmapManager instance. If this is the case, why not have one static ClientMmapManager with a refcount to it, and remove ClientMmapManagerFactory class and variable mmapManager? I think it's better to have the refcount and manager instance encapsulated as private data inside an object, rather than floating around in the DFSClient class, because it prevents errors where someone might access the field without updating the reference count properly. bq. 2. HdfsZeroCopyCursor: might want to also initialize allowShortReads in the constructor. Users can't create instances of HdfsZeroCopyCursor directly (it's package-private). {{DFSInputStream#createZeroCopyConstructor}} creates them. We could start adding booleans to this function, but it seems clearer for people to just use setAllowShortReads. The kind of mess we have with FileSystem#create where there are a dozen different overloads and nobody can keep them straight is an antipattern. bq. ... Not sure which case is more expected by the users, shortReads allowed or disallowed. That's a good question. My experience has been that many developers don't handle short reads very well (sometimes including me). It's just another corner case that they have to remember to handle, and if they're not FS developers they often don't even realize that it can happen. So I have defaulted it to off unless it's explicitly requested. bq. 4. DFSInputStream: remove unused import and add debug level check for DFSClient.LOG.Debug(). OK bq. 5. TestBlockReader: Assume.assumeTrue(SystemUtils.IS_OS_UNIX), guess you meant IS_OS_LINUX mmap is present and supported on other UNIXes besides Linux bq. 6. test_libhdfs_zerocopy.c: remove repeated fixed bq. 7. TestBlockReaderLocal.java: remove unused import ok bq. 8. please add javadoc to some classes, e.g., ClientMap,ClientMapManager ok bq. andrew wrote: [hdfs-default.xml] Has some extra lines of java pasted in. fixed bq. Let's beef up the [zerocopycursor] javadoc I added an example. bq. read() javadoc: EOF here refers to an EOF when reading a block, not EOF of the HDFS file. Would prefer to see "end of block". EOF is only thrown at end-of-file, as described in the JavaDoc. bq. Would like to see explicit setting of allowShortReads to false in the constructor for clarity. done bq. serialVersionUID should be private ok bq. Maybe rename put to unref or close? It's not actually "putting" in the data structure sense, which is confusing. renamed to unref bq. let's not call people "bad programmers", just say "accidentally leaked references". I changed this to "code which leaks references accidentally" to make it more boring bq. unmap: add to javadoc that it should only be called if the manager has been closed, or by the manager with the lock held. I added "Should be called with the ClientMmapManager lock held" bq. Need a space before the "=". ok bq. Let's add some javadoc on... why it's important to cache [mmaps] added to ClientMmapManager bq. I think fromConf-style factory methods are more normally called get, e.g. FileSystem.get. FileSystem#get uses a cache, whereas ClientMmapManager#fromConf does not. I think it would be confusing to name them similarly... bq. Why is the CacheCleaner executor using half the timeout for the delay and period? Half the timeout period is the minimum period for which we can ensure that we time out mmaps on time. Think about if we used the timeout itself as the period. In that case, we might be 1 second away from the 15-minute (or whatever) expiration period when the cleaner thread runs. Then we have to wait another 15 minutes, effectively doubling the timeout. bq. We might in fact want to key off of System.nanoTime for fewer collisions Good point; changed. bq. I think evictOne would be clearer if you used TreeSet#pollFirst rather than an iterator. yeah, changed bq. This has 10 spaces, where elsewhere in the file you use a double-indent of 4. ok, I'll make it 4 bq. Remaining TODO for blocks bigger than 2GB, want to file a follow-on JIRA for this? filed bq. readZeroCopy catches and re-sets the interrupted status, does something else check this later? No. It would only happen if some third-party software delivered an InterruptedException to us. In that case the client is responsible for checking and doing something with the InterruptedException (or not). This all happens in the client thread. bq. Is it worth re-trying the mmap after a CacheCleaner period in case some space has been freed up in the cache? BlockReader objects get destroyed and re-created a lot. For example, a long seek will clear the current block reader. So will reading the rest of the block. So I doubt that re-trying will be useful, given that the block reader will probably be gone by that time anyway. Also, hopefully we are not seeing a lot of IOExceptions while trying to mmap, because that indicates some configuration issues. bq. The clientMmap from the manager can be null if the cache is full. I don't see a check for this case. That's a very good catch! Fixed. bq. JNI test has some commented out fprintfs fixed bq. Would like to see some tests for cache eviction behavior It's tricky because of the singleton nature. I'll look at creating another junit test where the limits are set lower. Similar with no backing buffer. > enable HDFS local reads via mmap > -------------------------------- > > Key: HDFS-4953 > URL: https://issues.apache.org/jira/browse/HDFS-4953 > Project: Hadoop HDFS > Issue Type: New Feature > Affects Versions: 2.3.0 > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Attachments: benchmark.png, HDFS-4953.001.patch, HDFS-4953.002.patch, > HDFS-4953.003.patch, HDFS-4953.004.patch, HDFS-4953.005.patch, > HDFS-4953.006.patch > > > Currently, the short-circuit local read pathway allows HDFS clients to access > files directly without going through the DataNode. However, all of these > reads involve a copy at the operating system level, since they rely on the > read() / pread() / etc family of kernel interfaces. > We would like to enable HDFS to read local files via mmap. This would enable > truly zero-copy reads. > In the initial implementation, zero-copy reads will only be performed when > checksums were disabled. Later, we can use the DataNode's cache awareness to > only perform zero-copy reads when we know that checksum has already been > verified. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira