[jira] [Commented] (HDFS-5957) Provide support for different mmap cache retention policies in ShortCircuitCache.

Colin Patrick McCabe (JIRA) Thu, 20 Feb 2014 18:20:31 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907875#comment-13907875
 ]


Colin Patrick McCabe commented on HDFS-5957:
--------------------------------------------

I talked to [~kkambatl] about this.  It seems that YARN is monitoring the 
process' {{RSS}} (resident set size), which does seem to include the physical 
memory taken up by memory-mapped files.  I think this is unfortunate.  The 
physical memory taken up by mmapped files is basically part of the page cache.  
If there is any memory pressure at all, it's easy to purge this memory (the 
pages are "clean")  Charging an application for this memory is similar to 
charging it for the page cache consumed by calls to read(2)-- it doesn't really 
make sense for this application.  I think this is a problem within YARN, which 
has to be fixed inside YARN.

bq. It sounds like you really do need a deterministic way to trigger the munmap 
calls, i.e. LRU caching or no caching at all described above.

The {{munmap}} calls are deterministic now.  You can control the number of 
unused mmaps that we'll store by changing {{dfs.client.mmap.cache.size}}.

It's very important to keep in mind that {{dfs.client.mmap.cache.size}} 
controls the size of the cache, *not* the total number of mmaps.  So if my 
application has 10 threads that each use an mmap at a time, and the maximum 
cache size is 10, I may have 20 mmaps in existence at any given time.  The 
maximum size of any mmap is going to be the size of a block, so you should be 
able to use this to calculate how much RSS you will need.

bq. For small 200Gb data-sets (~1.4x tasks per container), ZCR does give a perf 
boost because we get to use HADOOP-10047 instead of shuffling it between byte[] 
buffers for decompression.

As a workaround, have you considered reading into a direct {{ByteBuffer}} that 
you allocated yourself?  {{DFSInputStream}} implements the 
{{ByteBufferReadable}} interface, which lets you read into any {{ByteBuffer}}.  
This would avoid the array copy that you're talking about.

I hope we can fix this within YARN soon, since otherwise the perf benefit of 
zero-copy reads will be substantially reduced or eliminated (as well as 
people's ability to use ZCR in the first place)

> Provide support for different mmap cache retention policies in 
> ShortCircuitCache.
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-5957
>                 URL: https://issues.apache.org/jira/browse/HDFS-5957
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 2.3.0
>            Reporter: Chris Nauroth
>
> Currently, the {{ShortCircuitCache}} retains {{mmap}} regions for reuse by 
> multiple reads of the same block or by multiple threads.  The eventual 
> {{munmap}} executes on a background thread after an expiration period.  Some 
> client usage patterns would prefer strict bounds on this cache and 
> deterministic cleanup by calling {{munmap}}.  This issue proposes additional 
> support for different caching policies that better fit these usage patterns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5957) Provide support for different mmap cache retention policies in ShortCircuitCache.

Reply via email to