[
https://issues.apache.org/jira/browse/HDFS-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907700#comment-13907700
]
Gopal V commented on HDFS-5957:
-------------------------------
[~cnauroth]: mmap() does take up physical memory, assuming those pages are
mapped into RAM and are not disk-resident.
As long as we're on Linux, it will show up in RSS as well as marked in the
Shared_Clean/Referenced field in /proc/<pid>/smaps.
YARN could do a better job of calculating "How much memory will be free'd up if
this process is killed" vs "How much memory does this process use". But that is
a completely different issue.
When I set the mmap timeout to 1000ms, some of my queries succeeded - mostly
the queries which were taking > 50 seconds.
But the really fast ORC queries which take ~10 seconds to run still managed to
hit around ~50x task failures out of ~3000 map tasks.
The perf dip happens because some of the failures.
For small 200Gb data-sets (~1.4x tasks per container), ZCR does give a perf
boost because we get to use HADOOP-10047 instead of shuffling it between byte[]
buffers for decompression.
> Provide support for different mmap cache retention policies in
> ShortCircuitCache.
> ---------------------------------------------------------------------------------
>
> Key: HDFS-5957
> URL: https://issues.apache.org/jira/browse/HDFS-5957
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs-client
> Affects Versions: 2.3.0
> Reporter: Chris Nauroth
>
> Currently, the {{ShortCircuitCache}} retains {{mmap}} regions for reuse by
> multiple reads of the same block or by multiple threads. The eventual
> {{munmap}} executes on a background thread after an expiration period. Some
> client usage patterns would prefer strict bounds on this cache and
> deterministic cleanup by calling {{munmap}}. This issue proposes additional
> support for different caching policies that better fit these usage patterns.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)