[ 
https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429354#comment-13429354
 ] 

Aaron T. Myers commented on HDFS-3672:
--------------------------------------

Patch looks pretty good to me. A few comments:

# In DFSClient#getDiskBlockLocations, I recommend you add an instanceof check 
before the BlockLocation downcast to HdfsBlockLocation. Much better to throw a 
helpful RTE than some opaque ClassCastException.
# The DFSClient#getDiskBlockLocations method is huge, and has a few very 
distinct phases. I recommend you break this up into a few separate helper 
methods, e.g. one or two to initialize the data structures, one or two to 
perform the RPCs, one to re-associate the DN results with the correct block, 
etc.
# Unless I'm missing something, seems like you could easily make 
DiskBlockLocationCallable a static inner class.
# The javadoc parameter comment "@param blocks a List<LocatedBlock>" is not 
very helpful, since when the javadocs are generated the type of the parameter 
will automatically be included.
# The javadoc for DFSClient#getDiskBlockLocations should be a proper javadoc, 
i.e. with @param and @returns tags. I also recommend having this javadoc 
reference DistributedFileSystem#getFileDiskBlockLocations.
# In the new javadoc in DistributedFileSystem, you incorrectly say that this 
interface exists in the FileSystem class as well, and say "this is more helpful 
with DFS", which is the only implementation.
# I think you should change the LimitedPrivate InterfaceAudience annotations to 
Public, but keep the Unstable InterfaceStability annotations.
# Put a single space around your operators, e.g. "for (int i=0; 
i<blocks.size(); i++)"
# Unless I'm missing something, I don't think I see the ability to disable this 
feature, let alone it being off by default, as Arun requested.
                
> Expose disk-location information for blocks to enable better scheduling
> -----------------------------------------------------------------------
>
>                 Key: HDFS-3672
>                 URL: https://issues.apache.org/jira/browse/HDFS-3672
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.0.0-alpha
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: design-doc-v1.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, 
> hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch
>
>
> Currently, HDFS exposes on which datanodes a block resides, which allows 
> clients to make scheduling decisions for locality and load balancing. 
> Extending this to also expose on which disk on a datanode a block resides 
> would enable even better scheduling, on a per-disk rather than coarse 
> per-datanode basis.
> This API would likely look similar to Filesystem#getFileBlockLocations, but 
> also involve a series of RPCs to the responsible datanodes to determine disk 
> ids.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to