[jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling

Aaron T. Myers (JIRA) Thu, 09 Aug 2012 06:39:21 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431802#comment-13431802
 ]


Aaron T. Myers commented on HDFS-3672:
--------------------------------------

bq. Why is this API marked @InterfaceAudience.Public. I think we should remove 
it and just leave InterfaceStability.Unstable

I was under the impression that all public classes needed to have an 
@InterfaceAudience annotation, and all public classes needed to have an 
@InterfaceStability annotation unless they're marked 
@InterfaceAudience.Private. Am I wrong about that?

bq. Configuration to turn off this functionlity should be on the server side 
also. Otherwise a client can just enable this functionlality without the admin 
having control over it.

I thought about this a fair bit while reviewing the code. The conclusion that I 
came to is that the stated reason that Arun wanted this feature disabled by 
default was "so that people who use this understand that this isn't necessarily 
supported." A client-side-only config seems to serve that purpose. Making this 
config server side as well only serves to require the admin enable the config 
and restart their cluster before some client that wants to try to use this 
functionality can give it a shot. That seems to me to be a strictly unnecessary 
pain for both the admin and user that doesn't seem to further Arun's stated 
goal. For that matter, why would an admin want to prevent clients from calling 
this API? If you insist on having a server side config for this, I'd like to 
suggest having two separate configs: a server-side one that defaults to 
enabled, but so that an admin may consciously disable it, and a client-side 
config that defaults to disabled so that users of this API must consciously 
configure their client, to support Arun's stated goal of making sure people are 
aware that it's an experimental API.
                
> Expose disk-location information for blocks to enable better scheduling
> -----------------------------------------------------------------------
>
>                 Key: HDFS-3672
>                 URL: https://issues.apache.org/jira/browse/HDFS-3672
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.0.0-alpha
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: design-doc-v1.pdf, design-doc-v2.pdf, hdfs-3672-1.patch, 
> hdfs-3672-2.patch, hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch, 
> hdfs-3672-6.patch, hdfs-3672-7.patch, hdfs-3672-8.patch
>
>
> Currently, HDFS exposes on which datanodes a block resides, which allows 
> clients to make scheduling decisions for locality and load balancing. 
> Extending this to also expose on which disk on a datanode a block resides 
> would enable even better scheduling, on a per-disk rather than coarse 
> per-datanode basis.
> This API would likely look similar to Filesystem#getFileBlockLocations, but 
> also involve a series of RPCs to the responsible datanodes to determine disk 
> ids.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3672) Expose disk-location information for blocks to enable better scheduling

Reply via email to