[ 
https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420788#comment-13420788
 ] 

Todd Lipcon commented on HDFS-3672:
-----------------------------------

Hey Suresh. I agree with all your points above.

One thing that's been talked about in the past is to consider using a 
local-only block pool for MR temp storage. That would at least get one of the 
other major disk users going through the same code paths.

The other idea we're thinking about is to expose disk statistics such as 
current queue length and utilization for each local disk, up via the OS. We're 
still running some experiments locally, but our assumption is that, within 
short time-scales (~0.5 seconds), the lagging 0.5 second usage is a reasonably 
good predictor of the next 0.5 seconds, given most Hadoop-style access is of 
100MB+ chunks of data.

So, are you OK with introducing these as Unstable-annotated APIs, perhaps with 
an extra JavaDoc warning that they are explicitly experimental and may cease to 
exist in the future?
                
> Expose disk-location information for blocks to enable better scheduling
> -----------------------------------------------------------------------
>
>                 Key: HDFS-3672
>                 URL: https://issues.apache.org/jira/browse/HDFS-3672
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.0.0-alpha
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: hdfs-3672-1.patch
>
>
> Currently, HDFS exposes on which datanodes a block resides, which allows 
> clients to make scheduling decisions for locality and load balancing. 
> Extending this to also expose on which disk on a datanode a block resides 
> would enable even better scheduling, on a per-disk rather than coarse 
> per-datanode basis.
> This API would likely look similar to Filesystem#getFileBlockLocations, but 
> also involve a series of RPCs to the responsible datanodes to determine disk 
> ids.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to