[jira] [Updated] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling

Andrew Wang (JIRA) Wed, 01 Aug 2012 17:18:06 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andrew Wang updated HDFS-3672:
------------------------------

    Attachment: design-doc-v1.pdf

Attaching design doc detailing the usecases, and trying to plot out the future 
direction. Happy to expand on anything unclear.

Overall, I feel like there's strong interest in the API from multiple parties 
(the unnamed Cloudera customer, HBase, MR), and fairly clear potential 
performance improvements. I'd appreciate any advice on making it crystal clear 
to downstream users that this is an unstable API. We've already got the 
appropriate annotations, and I could also make it require a config option 
before doing anything useful (which I think satisfies "default off"). Any other 
suggestions?
                
> Expose disk-location information for blocks to enable better scheduling
> -----------------------------------------------------------------------
>
>                 Key: HDFS-3672
>                 URL: https://issues.apache.org/jira/browse/HDFS-3672
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.0.0-alpha
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: design-doc-v1.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, 
> hdfs-3672-3.patch, hdfs-3672-4.patch
>
>
> Currently, HDFS exposes on which datanodes a block resides, which allows 
> clients to make scheduling decisions for locality and load balancing. 
> Extending this to also expose on which disk on a datanode a block resides 
> would enable even better scheduling, on a per-disk rather than coarse 
> per-datanode basis.
> This API would likely look similar to Filesystem#getFileBlockLocations, but 
> also involve a series of RPCs to the responsible datanodes to determine disk 
> ids.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling

Reply via email to