[ 
https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897170#comment-13897170
 ] 

Andrew Wang commented on HDFS-5810:
-----------------------------------

Hi Colin, some replies and new comments. I looked at the remaining parts of the 
previous patch, I haven't looked at the newest rev yet:

Replies:
bq. <mmap.cache.timeout>
Sure, let's just doc it.

bq. polymorphic Object in SCReplica
Sure, this is just a style nit. If you tried it the other way and it looked 
worse, it's fine to leave it as is.

bq. <mmap and global lock>
I guess this is for my own edification, but isn't munmap going to be 
approximately the same cost as mmap? Both involve updating the page tables and 
a TLB flush AFAIK, which should be order microseconds. This could be pushed up 
to milliseconds if the page tables are swapped out, but that's again an issue 
for both. I'd like to be internally consistent with regard to our locking, if 
it's a performance argument.

Overall, I feel like microseconds are not a big deal, and mmap/munmap 
themselves have to grab a kernel lock. The code savings from removing the CV 
also aren't bad, since we could reduce the polymorphism of SCReplica#mmapData.

Some new comments too (I think I've looked at all the changed files at this 
point):

ClientContext:
* ClientContext#confAsString has a dupe of socketCacheExpiry. Do we also need 
the mmap cache settings here?
* ClientContext#getFromConf, can we push the creation of a new DFSClient.Conf 
into #get when it's necessary? Seems better to avoid doing all those hash 
lookups.

BlockReaderFactory:
* We removed the javadoc parameter descriptions in a few places, some of which 
were helpful (e.g. {{len}} of {{-1}} means read as many bytes as possible). 
Could we add the one-line docs back to the builder variables?
* Mind adding "dfs.client.cached.conn.retry" to hdfs-default.xml?
* cacheTries now counts down instead of counting up, so I think it needs a new 
name. cacheTriesRemaining isn't great, but something like that.
* cacheTries used to also only tick when we got a stale peer out of the cache. 
Now, nextTcpPeer and nextDomainPeer tick cacheTries unconditionally.
* Previously, we would disable domain sockets or throw an exception if we hit 
an error when using a new Peer (domain or TCP respectively). Now, we don't know 
if a peer is cached or new, and spin until we run out of cacheTries (which 
isn't really related here).

> Unify mmap cache and short-circuit file descriptor cache
> --------------------------------------------------------
>
>                 Key: HDFS-5810
>                 URL: https://issues.apache.org/jira/browse/HDFS-5810
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>    Affects Versions: 2.3.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, 
> HDFS-5810.006.patch, HDFS-5810.008.patch, HDFS-5810.015.patch, 
> HDFS-5810.016.patch, HDFS-5810.018.patch, HDFS-5810.019.patch
>
>
> We should unify the client mmap cache and the client file descriptor cache.  
> Since mmaps are granted corresponding to file descriptors in the cache 
> (currently FileInputStreamCache), they have to be tracked together to do 
> "smarter" things like HDFS-5182.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to