[ 
https://issues.apache.org/jira/browse/HDFS-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912082#comment-13912082
 ] 

Gopal V commented on HDFS-5957:
-------------------------------

bq. As a workaround, have you considered reading into a direct ByteBuffer that 
you allocated yourself? 

That was attempted & we do follow that codepath for the remote reads. That is 
slower than the zero copy read because to produce a direct byte buffer, the JVM 
 has to defragment memory & produce a contiguous large memory region - this 
triggered a full GC pass, which caused stragglers with container reuse.

On top of that the direct readable ignores all the mlocked memory in the 
DataNode, which means we end up spending twice as much physical memory for a 
cached block than with zero copy reads - plus there's the overhead of the 
copying from DN's mmap section into a JVM HeapByteBuffer and a checksum check 
because this is following the Short-Circuit-Read pathway.

The whole performance push into zero-copy reads is to actually use off-heap 
memory here for performance & leave that space aside for the sort buffers, 
map-join memory and group-by top-n hashes.

I don't think using a slower codepath which takes up twice as much memory with 
more GC overhead is a good idea if this is to be a performance improvement at 
the end of it all.

> Provide support for different mmap cache retention policies in 
> ShortCircuitCache.
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-5957
>                 URL: https://issues.apache.org/jira/browse/HDFS-5957
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 2.3.0
>            Reporter: Chris Nauroth
>
> Currently, the {{ShortCircuitCache}} retains {{mmap}} regions for reuse by 
> multiple reads of the same block or by multiple threads.  The eventual 
> {{munmap}} executes on a background thread after an expiration period.  Some 
> client usage patterns would prefer strict bounds on this cache and 
> deterministic cleanup by calling {{munmap}}.  This issue proposes additional 
> support for different caching policies that better fit these usage patterns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to