[ https://issues.apache.org/jira/browse/HDFS-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429354#comment-13429354 ]
Aaron T. Myers commented on HDFS-3672: -------------------------------------- Patch looks pretty good to me. A few comments: # In DFSClient#getDiskBlockLocations, I recommend you add an instanceof check before the BlockLocation downcast to HdfsBlockLocation. Much better to throw a helpful RTE than some opaque ClassCastException. # The DFSClient#getDiskBlockLocations method is huge, and has a few very distinct phases. I recommend you break this up into a few separate helper methods, e.g. one or two to initialize the data structures, one or two to perform the RPCs, one to re-associate the DN results with the correct block, etc. # Unless I'm missing something, seems like you could easily make DiskBlockLocationCallable a static inner class. # The javadoc parameter comment "@param blocks a List<LocatedBlock>" is not very helpful, since when the javadocs are generated the type of the parameter will automatically be included. # The javadoc for DFSClient#getDiskBlockLocations should be a proper javadoc, i.e. with @param and @returns tags. I also recommend having this javadoc reference DistributedFileSystem#getFileDiskBlockLocations. # In the new javadoc in DistributedFileSystem, you incorrectly say that this interface exists in the FileSystem class as well, and say "this is more helpful with DFS", which is the only implementation. # I think you should change the LimitedPrivate InterfaceAudience annotations to Public, but keep the Unstable InterfaceStability annotations. # Put a single space around your operators, e.g. "for (int i=0; i<blocks.size(); i++)" # Unless I'm missing something, I don't think I see the ability to disable this feature, let alone it being off by default, as Arun requested. > Expose disk-location information for blocks to enable better scheduling > ----------------------------------------------------------------------- > > Key: HDFS-3672 > URL: https://issues.apache.org/jira/browse/HDFS-3672 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 2.0.0-alpha > Reporter: Andrew Wang > Assignee: Andrew Wang > Attachments: design-doc-v1.pdf, hdfs-3672-1.patch, hdfs-3672-2.patch, > hdfs-3672-3.patch, hdfs-3672-4.patch, hdfs-3672-5.patch > > > Currently, HDFS exposes on which datanodes a block resides, which allows > clients to make scheduling decisions for locality and load balancing. > Extending this to also expose on which disk on a datanode a block resides > would enable even better scheduling, on a per-disk rather than coarse > per-datanode basis. > This API would likely look similar to Filesystem#getFileBlockLocations, but > also involve a series of RPCs to the responsible datanodes to determine disk > ids. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira