[
https://issues.apache.org/jira/browse/HDFS-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908793#comment-13908793
]
Chris Nauroth commented on HDFS-5957:
-------------------------------------
Thank you [~kkambatl] for also taking a look.
bq. I think this is a problem within YARN, which has to be fixed inside YARN.
Did you have a specific implementation in mind? Something like trying to scan
/proc/pid/smaps and subtract the clean pages from RSS? I'm curious if we'd
increase the risk of thrashing. It's probably worthwhile at this point to spin
off a separate YARN issue to carry on discussion, and focus on short-cicuit
read here.
bq. The munmap calls are deterministic now. You can control the number of
unused mmaps that we'll store by changing {{dfs.client.mmap.cache.size}}.
I may have misread this part of the code earlier. I thought {{munmap}} could
only ever get triggered from the background cleaner thread, but now I see that
it can also get triggered on unreferencing a replica, which would be
synchronous to the caller.
[~gopalv], I think it would be worthwhile to try reverting your setting for
{{dfs.client.mmap.cache.timeout.ms}} and instead downtune
{{dfs.client.mmap.cache.size}} to a small value. Here is the full
documentation for this property. (Note the large-ish default.)
{code}
<property>
<name>dfs.client.mmap.cache.size</name>
<value>1024</value>
<description>
When zero-copy reads are used, the DFSClient keeps a cache of recently used
memory mapped regions. This parameter controls the maximum number of
entries that we will keep in that cache.
If this is set to 0, we will not allow mmap.
The larger this number is, the more file descriptors we will potentially
use for memory-mapped files. mmaped files also use virtual address space.
You may need to increase your ulimit virtual address space limits before
increasing the client mmap cache size.
</description>
</property>
{code}
bq. As a workaround, have you considered reading into a direct ByteBuffer that
you allocated yourself? DFSInputStream implements the ByteBufferReadable
interface, which lets you read into any ByteBuffer. This would avoid the array
copy that you're talking about.
Gopal, is this also worth trying?
> Provide support for different mmap cache retention policies in
> ShortCircuitCache.
> ---------------------------------------------------------------------------------
>
> Key: HDFS-5957
> URL: https://issues.apache.org/jira/browse/HDFS-5957
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs-client
> Affects Versions: 2.3.0
> Reporter: Chris Nauroth
>
> Currently, the {{ShortCircuitCache}} retains {{mmap}} regions for reuse by
> multiple reads of the same block or by multiple threads. The eventual
> {{munmap}} executes on a background thread after an expiration period. Some
> client usage patterns would prefer strict bounds on this cache and
> deterministic cleanup by calling {{munmap}}. This issue proposes additional
> support for different caching policies that better fit these usage patterns.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)