[ 
https://issues.apache.org/jira/browse/HDFS-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908793#comment-13908793
 ] 

Chris Nauroth commented on HDFS-5957:
-------------------------------------

Thank you [~kkambatl] for also taking a look.

bq. I think this is a problem within YARN, which has to be fixed inside YARN.

Did you have a specific implementation in mind?  Something like trying to scan 
/proc/pid/smaps and subtract the clean pages from RSS?  I'm curious if we'd 
increase the risk of thrashing.  It's probably worthwhile at this point to spin 
off a separate YARN issue to carry on discussion, and focus on short-cicuit 
read here.

bq. The munmap calls are deterministic now. You can control the number of 
unused mmaps that we'll store by changing {{dfs.client.mmap.cache.size}}.

I may have misread this part of the code earlier.  I thought {{munmap}} could 
only ever get triggered from the background cleaner thread, but now I see that 
it can also get triggered on unreferencing a replica, which would be 
synchronous to the caller.

[~gopalv], I think it would be worthwhile to try reverting your setting for 
{{dfs.client.mmap.cache.timeout.ms}} and instead downtune 
{{dfs.client.mmap.cache.size}} to a small value.  Here is the full 
documentation for this property.  (Note the large-ish default.)

{code}
<property>
  <name>dfs.client.mmap.cache.size</name>
  <value>1024</value>
  <description>
    When zero-copy reads are used, the DFSClient keeps a cache of recently used
    memory mapped regions.  This parameter controls the maximum number of
    entries that we will keep in that cache.

    If this is set to 0, we will not allow mmap.

    The larger this number is, the more file descriptors we will potentially
    use for memory-mapped files.  mmaped files also use virtual address space.
    You may need to increase your ulimit virtual address space limits before
    increasing the client mmap cache size.
  </description>
</property>
{code}

bq.  As a workaround, have you considered reading into a direct ByteBuffer that 
you allocated yourself? DFSInputStream implements the ByteBufferReadable 
interface, which lets you read into any ByteBuffer. This would avoid the array 
copy that you're talking about.

Gopal, is this also worth trying?


> Provide support for different mmap cache retention policies in 
> ShortCircuitCache.
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-5957
>                 URL: https://issues.apache.org/jira/browse/HDFS-5957
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 2.3.0
>            Reporter: Chris Nauroth
>
> Currently, the {{ShortCircuitCache}} retains {{mmap}} regions for reuse by 
> multiple reads of the same block or by multiple threads.  The eventual 
> {{munmap}} executes on a background thread after an expiration period.  Some 
> client usage patterns would prefer strict bounds on this cache and 
> deterministic cleanup by calling {{munmap}}.  This issue proposes additional 
> support for different caching policies that better fit these usage patterns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to