[jira] [Updated] (HDFS-4949) Centralized cache management in HDFS

2016-03-28 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-4949:
--
Fix Version/s: 2.3.0

> Centralized cache management in HDFS
> 
>
> Key: HDFS-4949
> URL: https://issues.apache.org/jira/browse/HDFS-4949
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 3.0.0, 2.3.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Fix For: 2.3.0
>
> Attachments: HDFS-4949-consolidated.patch, 
> caching-design-doc-2013-07-02.pdf, caching-design-doc-2013-08-09.pdf, 
> caching-design-doc-2013-10-24.pdf, caching-testplan.pdf, 
> hdfs-4949-branch-2.patch
>
>
> HDFS currently has no support for managing or exposing in-memory caches at 
> datanodes. This makes it harder for higher level application frameworks like 
> Hive, Pig, and Impala to effectively use cluster memory, because they cannot 
> explicitly cache important datasets or place their tasks for memory locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-4949) Centralized cache management in HDFS

2016-03-27 Thread linhaiqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

linhaiqiang updated HDFS-4949:
--
Fix Version/s: (was: 2.3.0)

> Centralized cache management in HDFS
> 
>
> Key: HDFS-4949
> URL: https://issues.apache.org/jira/browse/HDFS-4949
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 3.0.0, 2.3.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: HDFS-4949-consolidated.patch, 
> caching-design-doc-2013-07-02.pdf, caching-design-doc-2013-08-09.pdf, 
> caching-design-doc-2013-10-24.pdf, caching-testplan.pdf, 
> hdfs-4949-branch-2.patch
>
>
> HDFS currently has no support for managing or exposing in-memory caches at 
> datanodes. This makes it harder for higher level application frameworks like 
> Hive, Pig, and Impala to effectively use cluster memory, because they cannot 
> explicitly cache important datasets or place their tasks for memory locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-4949) Centralized cache management in HDFS

2014-01-22 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-4949:
--

   Resolution: Fixed
Fix Version/s: 2.4.0
   Status: Resolved  (was: Patch Available)

I went through and resolved or pushed out all remaining subtasks. With the code 
in branch-2, we can resolve this parent issue. Thanks for all the contributions 
from everyone involved!

 Centralized cache management in HDFS
 

 Key: HDFS-4949
 URL: https://issues.apache.org/jira/browse/HDFS-4949
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Affects Versions: 3.0.0, 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.4.0

 Attachments: HDFS-4949-consolidated.patch, 
 caching-design-doc-2013-07-02.pdf, caching-design-doc-2013-08-09.pdf, 
 caching-design-doc-2013-10-24.pdf, caching-testplan.pdf, 
 hdfs-4949-branch-2.patch


 HDFS currently has no support for managing or exposing in-memory caches at 
 datanodes. This makes it harder for higher level application frameworks like 
 Hive, Pig, and Impala to effectively use cluster memory, because they cannot 
 explicitly cache important datasets or place their tasks for memory locality.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-4949) Centralized cache management in HDFS

2014-01-21 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-4949:
--

Attachment: hdfs-4949-branch-2.patch

Attached is a consolidated patch for branch-2. Unfortunately we left the 
HDFS-4949 branch fallow while development continued in trunk, but I did my best 
to squash all of the caching-related patches committed thus far into this mega 
patch. A preliminary test run of HDFS and Common looked good, but I'm running 
another right now on this version of the patch to verify.

 Centralized cache management in HDFS
 

 Key: HDFS-4949
 URL: https://issues.apache.org/jira/browse/HDFS-4949
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Affects Versions: 3.0.0, 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: HDFS-4949-consolidated.patch, 
 caching-design-doc-2013-07-02.pdf, caching-design-doc-2013-08-09.pdf, 
 caching-design-doc-2013-10-24.pdf, caching-testplan.pdf, 
 hdfs-4949-branch-2.patch


 HDFS currently has no support for managing or exposing in-memory caches at 
 datanodes. This makes it harder for higher level application frameworks like 
 Hive, Pig, and Impala to effectively use cluster memory, because they cannot 
 explicitly cache important datasets or place their tasks for memory locality.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-4949) Centralized cache management in HDFS

2013-10-24 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-4949:
--

Status: Patch Available  (was: Open)

 Centralized cache management in HDFS
 

 Key: HDFS-4949
 URL: https://issues.apache.org/jira/browse/HDFS-4949
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Affects Versions: 3.0.0, 2.3.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: caching-design-doc-2013-07-02.pdf, 
 caching-design-doc-2013-08-09.pdf, caching-testplan.pdf, 
 HDFS-4949-consolidated.patch


 HDFS currently has no support for managing or exposing in-memory caches at 
 datanodes. This makes it harder for higher level application frameworks like 
 Hive, Pig, and Impala to effectively use cluster memory, because they cannot 
 explicitly cache important datasets or place their tasks for memory locality.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-4949) Centralized cache management in HDFS

2013-10-24 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-4949:
--

Attachment: HDFS-4949-consolidated.patch

Consolidated patch attached to get a Jenkins run.

 Centralized cache management in HDFS
 

 Key: HDFS-4949
 URL: https://issues.apache.org/jira/browse/HDFS-4949
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Affects Versions: 3.0.0, 2.3.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: caching-design-doc-2013-07-02.pdf, 
 caching-design-doc-2013-08-09.pdf, caching-testplan.pdf, 
 HDFS-4949-consolidated.patch


 HDFS currently has no support for managing or exposing in-memory caches at 
 datanodes. This makes it harder for higher level application frameworks like 
 Hive, Pig, and Impala to effectively use cluster memory, because they cannot 
 explicitly cache important datasets or place their tasks for memory locality.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-4949) Centralized cache management in HDFS

2013-10-24 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-4949:
---

Attachment: caching-design-doc-2013-10-24.pdf

updated design doc.

Revisions:
* change future tense to present tense in some cases.
* grammar corrections
* update to reflect the fact that caching information is stored in 
{{LocatedBlocks}} rather than {{BlockLocation}}
* move cache expiry feature to future work
* remove part about pools being in a configuration file (they are stored in the 
edit log)
* rework API documentation to match current API

 Centralized cache management in HDFS
 

 Key: HDFS-4949
 URL: https://issues.apache.org/jira/browse/HDFS-4949
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Affects Versions: 3.0.0, 2.3.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: caching-design-doc-2013-07-02.pdf, 
 caching-design-doc-2013-08-09.pdf, caching-design-doc-2013-10-24.pdf, 
 caching-testplan.pdf, HDFS-4949-consolidated.patch


 HDFS currently has no support for managing or exposing in-memory caches at 
 datanodes. This makes it harder for higher level application frameworks like 
 Hive, Pig, and Impala to effectively use cluster memory, because they cannot 
 explicitly cache important datasets or place their tasks for memory locality.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-4949) Centralized cache management in HDFS

2013-10-24 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-4949:
--

Attachment: (was: HDFS-4949-consolidated.patch)

 Centralized cache management in HDFS
 

 Key: HDFS-4949
 URL: https://issues.apache.org/jira/browse/HDFS-4949
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Affects Versions: 3.0.0, 2.3.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: caching-design-doc-2013-07-02.pdf, 
 caching-design-doc-2013-08-09.pdf, caching-design-doc-2013-10-24.pdf, 
 caching-testplan.pdf


 HDFS currently has no support for managing or exposing in-memory caches at 
 datanodes. This makes it harder for higher level application frameworks like 
 Hive, Pig, and Impala to effectively use cluster memory, because they cannot 
 explicitly cache important datasets or place their tasks for memory locality.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-4949) Centralized cache management in HDFS

2013-10-24 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-4949:
--

Attachment: HDFS-4949-consolidated.patch

New patch attached. The RAT error is spurious due to the CHANGES-HDFS-4949.txt 
file. I also want to fix the edits unit test after we merge, since historically 
checking in the new binary file has been tricky to get right via patch. We've 
gotten clean unit test runs on upstream Jenkins, so I have confidence that it's 
correct. Finally, the javac warnings are also intentional, related to using the 
internal unmap APIs.

 Centralized cache management in HDFS
 

 Key: HDFS-4949
 URL: https://issues.apache.org/jira/browse/HDFS-4949
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Affects Versions: 3.0.0, 2.3.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: caching-design-doc-2013-07-02.pdf, 
 caching-design-doc-2013-08-09.pdf, caching-design-doc-2013-10-24.pdf, 
 caching-testplan.pdf, HDFS-4949-consolidated.patch


 HDFS currently has no support for managing or exposing in-memory caches at 
 datanodes. This makes it harder for higher level application frameworks like 
 Hive, Pig, and Impala to effectively use cluster memory, because they cannot 
 explicitly cache important datasets or place their tasks for memory locality.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-4949) Centralized cache management in HDFS

2013-10-23 Thread Stephen Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Chu updated HDFS-4949:
--

Attachment: caching-testplan.pdf

Attaching the test plan for this feature (caching-testplan.pdf).

 Centralized cache management in HDFS
 

 Key: HDFS-4949
 URL: https://issues.apache.org/jira/browse/HDFS-4949
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Affects Versions: 3.0.0, 2.3.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: caching-design-doc-2013-07-02.pdf, 
 caching-design-doc-2013-08-09.pdf, caching-testplan.pdf


 HDFS currently has no support for managing or exposing in-memory caches at 
 datanodes. This makes it harder for higher level application frameworks like 
 Hive, Pig, and Impala to effectively use cluster memory, because they cannot 
 explicitly cache important datasets or place their tasks for memory locality.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-4949) Centralized cache management in HDFS

2013-08-09 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-4949:
--

Attachment: caching-design-doc-2013-08-09.pdf

Suresh, thanks for posting your notes. Attached is a revised design doc that 
beefs up the resource management / user quotas section, as well as addressing 
your other smaller points.

As a meta-point, I think much of the remaining resource management design can 
wait until after we get the initial end-to-end implementation going. I think 
it's reasonable for the first iteration to do something simple like superuser 
only or user quotas, then we layer on the complexities of pools, priorities, 
ACLs, min/max/share, and failure cases afterwards. It's good to get the API 
roughly right so we code with foresight, but I don't see us getting around to 
implementing pools for at least a month or two.

 Centralized cache management in HDFS
 

 Key: HDFS-4949
 URL: https://issues.apache.org/jira/browse/HDFS-4949
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Affects Versions: 3.0.0, 2.3.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: caching-design-doc-2013-07-02.pdf, 
 caching-design-doc-2013-08-09.pdf


 HDFS currently has no support for managing or exposing in-memory caches at 
 datanodes. This makes it harder for higher level application frameworks like 
 Hive, Pig, and Impala to effectively use cluster memory, because they cannot 
 explicitly cache important datasets or place their tasks for memory locality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4949) Centralized cache management in HDFS

2013-07-02 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-4949:
--

Attachment: caching-design-doc-2013-07-02.pdf

Here's a design doc that we've been working on internally. It proposes adding 
off-heap caches to each datanode using mmap and mlock, managed centrally by the 
NameNode.

Any feedback welcomed. I'm hoping we can have a fruitful design discussion on 
this JIRA, then perhaps get a branch and start development.

 Centralized cache management in HDFS
 

 Key: HDFS-4949
 URL: https://issues.apache.org/jira/browse/HDFS-4949
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Affects Versions: 3.0.0, 2.2.0
Reporter: Andrew Wang
 Attachments: caching-design-doc-2013-07-02.pdf


 HDFS currently has no support for managing or exposing in-memory caches at 
 datanodes. This makes it harder for higher level application frameworks like 
 Hive, Pig, and Impala to effectively use cluster memory, because they cannot 
 explicitly cache important datasets or place their tasks for memory locality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira