[ 
https://issues.apache.org/jira/browse/HDFS-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155818#comment-14155818
 ] 

Chris Nauroth commented on HDFS-6919:
-------------------------------------

I've always thought of cache pools as an abstraction over {{ulimit -l}}.  The 
cache pool defines permissions and an upper limit on the number of bytes that 
can be locked into memory via cache directives in that cache pool.  When an 
admin creates a cache pool of a certain size and grants access to a set of 
users, it's analogous to using {{ulimit}} to restrict those users' hard limit 
for locked memory.

In-memory replica writes behave a lot like virtual memory.  Clients simply 
write, and if lazy-persist is enabled, then the DataNode has the freedom to 
buffer any written amount in memory.  This is done on a best-effort basis 
though, and contention on RAM would trigger fallback to disk for some portion 
of in-memory data.  This is analogous to paging.  Since the RAM-vs.-disk 
distinction is largely abstracted away from the client writer, this gives us a 
lot of freedom to evolve smarter cache eviction policies over time.

Arpit also has pointed out that a single writer can write over 1.5GB to memory 
in 3 seconds, and this may improve as we find optimizations in the write 
pipeline.  With multiple concurrent writers, it's possible that RAM disk usage 
could change entirely within a heartbeat interval.  This conflicts with cache 
pool enforcement, which happens centrally at the NameNode and is subject to the 
latency of the heartbeat interval.

Considering the above factors, I don't see a beneficial way to apply cache 
pools to in-memory replica writes.  Using a single cache pool named lazyPersist 
would sidestep one of the main features of cache pools: the ability to control 
different limits for different users.  Doing something more sophisticated, such 
as trying to match existing cache directives to paths of in-memory replica 
writes, implies a need for tighter coupling between NameNode and DataNode to 
pass cache pool information around (currently encapsulated at the NameNode).  
It's possible that none of this enforcement would be effective, considering the 
latency of the heartbeat interval.  Conceptually, users might find cache pools 
confusing here, since there is no analogous OS knob that enforces a quota on 
what portion of a traditional OS file descriptor write goes to buffer cache.

> Enforce a single limit for RAM disk usage and replicas cached via locking
> -------------------------------------------------------------------------
>
>                 Key: HDFS-6919
>                 URL: https://issues.apache.org/jira/browse/HDFS-6919
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Arpit Agarwal
>            Assignee: Colin Patrick McCabe
>            Priority: Blocker
>
> The DataNode can have a single limit for memory usage which applies to both 
> replicas cached via CCM and replicas on RAM disk.
> See comments 
> [1|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106025&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106025],
>  
> [2|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106245&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106245]
>  and 
> [3|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106575&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106575]
>  for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to