[ 
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735643#comment-13735643
 ] 

Arun C Murthy commented on HDFS-4949:
-------------------------------------

[~andrew.wang] overall it's looks great, some more questions:

# I'm not sure you want to automatically add new files in a directory to the 
cache, it seems a higher-level system (Hive, Impala, HCat) are in better 
position. Not doing this automatically simplifies cache mgmt, quota mgmt etc.
# Can you please provide details on the read apis? For the Hive/MR/Pig use case 
I'd like to see a new open(Path, offset, length) which returns an indicator for 
whether the block is cached or not. This, for e.g., would be used by the 
RecordReader to read the split.
                
> Centralized cache management in HDFS
> ------------------------------------
>
>                 Key: HDFS-4949
>                 URL: https://issues.apache.org/jira/browse/HDFS-4949
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: 3.0.0, 2.3.0
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: caching-design-doc-2013-07-02.pdf, 
> caching-design-doc-2013-08-09.pdf
>
>
> HDFS currently has no support for managing or exposing in-memory caches at 
> datanodes. This makes it harder for higher level application frameworks like 
> Hive, Pig, and Impala to effectively use cluster memory, because they cannot 
> explicitly cache important datasets or place their tasks for memory locality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to