[ https://issues.apache.org/jira/browse/HDFS-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155818#comment-14155818 ]
Chris Nauroth commented on HDFS-6919: ------------------------------------- I've always thought of cache pools as an abstraction over {{ulimit -l}}. The cache pool defines permissions and an upper limit on the number of bytes that can be locked into memory via cache directives in that cache pool. When an admin creates a cache pool of a certain size and grants access to a set of users, it's analogous to using {{ulimit}} to restrict those users' hard limit for locked memory. In-memory replica writes behave a lot like virtual memory. Clients simply write, and if lazy-persist is enabled, then the DataNode has the freedom to buffer any written amount in memory. This is done on a best-effort basis though, and contention on RAM would trigger fallback to disk for some portion of in-memory data. This is analogous to paging. Since the RAM-vs.-disk distinction is largely abstracted away from the client writer, this gives us a lot of freedom to evolve smarter cache eviction policies over time. Arpit also has pointed out that a single writer can write over 1.5GB to memory in 3 seconds, and this may improve as we find optimizations in the write pipeline. With multiple concurrent writers, it's possible that RAM disk usage could change entirely within a heartbeat interval. This conflicts with cache pool enforcement, which happens centrally at the NameNode and is subject to the latency of the heartbeat interval. Considering the above factors, I don't see a beneficial way to apply cache pools to in-memory replica writes. Using a single cache pool named lazyPersist would sidestep one of the main features of cache pools: the ability to control different limits for different users. Doing something more sophisticated, such as trying to match existing cache directives to paths of in-memory replica writes, implies a need for tighter coupling between NameNode and DataNode to pass cache pool information around (currently encapsulated at the NameNode). It's possible that none of this enforcement would be effective, considering the latency of the heartbeat interval. Conceptually, users might find cache pools confusing here, since there is no analogous OS knob that enforces a quota on what portion of a traditional OS file descriptor write goes to buffer cache. > Enforce a single limit for RAM disk usage and replicas cached via locking > ------------------------------------------------------------------------- > > Key: HDFS-6919 > URL: https://issues.apache.org/jira/browse/HDFS-6919 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Arpit Agarwal > Assignee: Colin Patrick McCabe > Priority: Blocker > > The DataNode can have a single limit for memory usage which applies to both > replicas cached via CCM and replicas on RAM disk. > See comments > [1|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106025&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106025], > > [2|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106245&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106245] > and > [3|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106575&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106575] > for discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)