[ https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740145#comment-13740145 ]
Andrew Wang commented on HDFS-4949: ----------------------------------- Hi Arun, On the read path comments, it might be elucidating to check out the zero-copy read API that Colin's working on at HDFS-4953. The idea is that clients always use the zero copy cursor to do reads, which behind the scenes will do an mmap'd read if the block is cached, or a normal copying read if the block is on disk or remote. It allows an {{isCached}}-type check via not setting a fallback buffer for copying reads. This will cause the cursor to throw an exception on read if the block is not cached. Finally, there's also a parameter for enabling short reads, which comes into play when a read spans block files. On YARN integration, I'd like to revisit that a little ways down the road since we're focusing on getting a basic prototype out. If you want to get started on it now, it'd be helpful if you could review the current RM plan in the doc, and sketch out how a YARN-based architecture would look. > Centralized cache management in HDFS > ------------------------------------ > > Key: HDFS-4949 > URL: https://issues.apache.org/jira/browse/HDFS-4949 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode > Affects Versions: 3.0.0, 2.3.0 > Reporter: Andrew Wang > Assignee: Andrew Wang > Attachments: caching-design-doc-2013-07-02.pdf, > caching-design-doc-2013-08-09.pdf > > > HDFS currently has no support for managing or exposing in-memory caches at > datanodes. This makes it harder for higher level application frameworks like > Hive, Pig, and Impala to effectively use cluster memory, because they cannot > explicitly cache important datasets or place their tasks for memory locality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira