[jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling

Andrew Purtell (JIRA) Wed, 15 Aug 2012 11:15:39 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435363#comment-13435363
 ]


Andrew Purtell commented on HDFS-3672:
--------------------------------------

@Suresh, thanks for linking HBASE-6572 to HDFS-2832, I missed that issue. 
That's a better issue linkage.

If HDFS is to support heterogeneous/tiered storage, then somehow the NNs and 
DNs must negotiate block placement by policy. For example, suppose the NN is 
doing some kind of path based mapping of files->blocks->device type. Say the 
default is disk. Now the user updates the policy for a subtree of the namespace 
to solid state. For any new file in that subtree the NN would presumably pass a 
hint to the DFSClient and the DFSClient would in turn pass the hint to the DNs: 
place block on the desired media type or fail. For any existing file in the 
subtree, the NN would need to migrate blocks from one storage tier to another. 
Presumably the DN must include in block reports the "disk location" including 
the media type so the NN has the necessary information to accomplish that. 
Simply exposing that "disk location" information via an API is the intent of 
this issue, right? Scratching one itch here can be leveraged as incremental 
development toward a larger goal? Happy to take this discussion to HDFS-2832 or 
offline or simply drop it if a distraction or in error.
                
> Expose disk-location information for blocks to enable better scheduling
> -----------------------------------------------------------------------
>
>                 Key: HDFS-3672
>                 URL: https://issues.apache.org/jira/browse/HDFS-3672
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.0.0-alpha
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: design-doc-v1.pdf, design-doc-v2.pdf, hdfs-3672-1.patch, 
> hdfs-3672-2.patch, hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch, 
> hdfs-3672-6.patch, hdfs-3672-7.patch, hdfs-3672-8.patch, hdfs-3672-9.patch
>
>
> Currently, HDFS exposes on which datanodes a block resides, which allows 
> clients to make scheduling decisions for locality and load balancing. 
> Extending this to also expose on which disk on a datanode a block resides 
> would enable even better scheduling, on a per-disk rather than coarse 
> per-datanode basis.
> This API would likely look similar to Filesystem#getFileBlockLocations, but 
> also involve a series of RPCs to the responsible datanodes to determine disk 
> ids.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling

Reply via email to