[jira] [Commented] (HDFS-4953) enable HDFS local reads via mmap

Colin Patrick McCabe (JIRA) Wed, 14 Aug 2013 18:37:48 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740551#comment-13740551
 ]

Colin Patrick McCabe commented on HDFS-4953:
--------------------------------------------

bq. brandon wrote: 1. DFSClient: looks like all the DFSClient instances share 
the same ClientMmapManager instance. If this is the case, why not have one 
static ClientMmapManager with a refcount to it, and remove 
ClientMmapManagerFactory class and variable mmapManager?

I think it's better to have the refcount and manager instance encapsulated as 
private data inside an object, rather than floating around in the DFSClient 
class, because it prevents errors where someone might access the field without 
updating the reference count properly.

bq. 2. HdfsZeroCopyCursor: might want to also initialize allowShortReads in the 
constructor. 

Users can't create instances of HdfsZeroCopyCursor directly (it's 
package-private).  {{DFSInputStream#createZeroCopyConstructor}} creates them.  
We could start adding booleans to this function, but it seems clearer for 
people to just use setAllowShortReads.  The kind of mess we have with 
FileSystem#create where there are a dozen different overloads and nobody can 
keep them straight is an antipattern.

bq. ... Not sure which case is more expected by the users, shortReads allowed 
or disallowed. 

That's a good question.  My experience has been that many developers don't 
handle short reads very well (sometimes including me).  It's just another 
corner case that they have to remember to handle, and if they're not FS 
developers they often don't even realize that it can happen.  So I have 
defaulted it to off unless it's explicitly requested.

bq. 4. DFSInputStream: remove unused import and add debug level check for 
DFSClient.LOG.Debug().

OK

bq. 5. TestBlockReader: Assume.assumeTrue(SystemUtils.IS_OS_UNIX), guess you 
meant IS_OS_LINUX

mmap is present and supported on other UNIXes besides Linux

bq. 6. test_libhdfs_zerocopy.c: remove repeated

fixed

bq. 7. TestBlockReaderLocal.java: remove unused import

ok

bq. 8. please add javadoc to some classes, e.g., ClientMap,ClientMapManager

ok

bq. andrew wrote: [hdfs-default.xml] Has some extra lines of java pasted in.

fixed

bq. Let's beef up the [zerocopycursor] javadoc

I added an example.

bq. read() javadoc: EOF here refers to an EOF when reading a block, not EOF of 
the HDFS file. Would prefer to see "end of block".

EOF is only thrown at end-of-file, as described in the JavaDoc.

bq. Would like to see explicit setting of allowShortReads to false in the 
constructor for clarity.

done

bq. serialVersionUID should be private

ok

bq. Maybe rename put to unref or close? It's not actually "putting" in the data 
structure sense, which is confusing.

renamed to unref

bq. let's not call people "bad programmers", just say "accidentally leaked 
references".

I changed this to "code which leaks references accidentally" to make it more 
boring

bq. unmap: add to javadoc that it should only be called if the manager has been 
closed, or by the manager with the lock held.

I added "Should be called with the ClientMmapManager lock held"

bq. Need a space before the "=".

ok

bq. Let's add some javadoc on... why it's important to cache [mmaps]

added to ClientMmapManager

bq. I think fromConf-style factory methods are more normally called get, e.g. 
FileSystem.get.

FileSystem#get uses a cache, whereas ClientMmapManager#fromConf does not.  I 
think it would be confusing to name them similarly...

bq. Why is the CacheCleaner executor using half the timeout for the delay and 
period?

Half the timeout period is the minimum period for which we can ensure that we 
time out mmaps on time.  Think about if we used the timeout itself as the 
period.  In that case, we might be 1 second away from the 15-minute (or 
whatever) expiration period when the cleaner thread runs.  Then we have to wait 
another 15 minutes, effectively doubling the timeout.

bq. We might in fact want to key off of System.nanoTime for fewer collisions

Good point; changed.

bq. I think evictOne would be clearer if you used TreeSet#pollFirst rather than 
an iterator.

yeah, changed

bq. This has 10 spaces, where elsewhere in the file you use a double-indent of 
4.

ok, I'll make it 4

bq. Remaining TODO for blocks bigger than 2GB, want to file a follow-on JIRA 
for this?

filed

bq. readZeroCopy catches and re-sets the interrupted status, does something 
else check this later?

No.  It would only happen if some third-party software delivered an 
InterruptedException to us.  In that case the client is responsible for 
checking and doing something with the InterruptedException (or not).  This all 
happens in the client thread.

bq. Is it worth re-trying the mmap after a CacheCleaner period in case some 
space has been freed up in the cache?

BlockReader objects get destroyed and re-created a lot.  For example, a long 
seek will clear the current block reader.  So will reading the rest of the 
block.  So I doubt that re-trying will be useful, given that the block reader 
will probably be gone by that time anyway.  Also, hopefully we are not seeing a 
lot of IOExceptions while trying to mmap, because that indicates some 
configuration issues.

bq. The clientMmap from the manager can be null if the cache is full. I don't 
see a check for this case.

That's a very good catch!  Fixed.

bq. JNI test has some commented out fprintfs

fixed

bq. Would like to see some tests for cache eviction behavior

It's tricky because of the singleton nature.  I'll look at creating another 
junit test where the limits are set lower.  Similar with no backing buffer.

> enable HDFS local reads via mmap
> --------------------------------
>
>                 Key: HDFS-4953
>                 URL: https://issues.apache.org/jira/browse/HDFS-4953
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>    Affects Versions: 2.3.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: benchmark.png, HDFS-4953.001.patch, HDFS-4953.002.patch, 
> HDFS-4953.003.patch, HDFS-4953.004.patch, HDFS-4953.005.patch, 
> HDFS-4953.006.patch
>
>
> Currently, the short-circuit local read pathway allows HDFS clients to access 
> files directly without going through the DataNode.  However, all of these 
> reads involve a copy at the operating system level, since they rely on the 
> read() / pread() / etc family of kernel interfaces.
> We would like to enable HDFS to read local files via mmap.  This would enable 
> truly zero-copy reads.
> In the initial implementation, zero-copy reads will only be performed when 
> checksums were disabled.  Later, we can use the DataNode's cache awareness to 
> only perform zero-copy reads when we know that checksum has already been 
> verified.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4953) enable HDFS local reads via mmap

Reply via email to