[ https://issues.apache.org/jira/browse/HDFS-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640059#comment-13640059 ]
Aaron T. Myers commented on HDFS-4698: -------------------------------------- Patch looks pretty good to me, though I do believe the findbugs warning is legitimate. A few little comments: # It looks to me like the patch misses a place in DFSInputStream where it should be adding to the statistics before closing a BlockReader. Currently the patch only adds the stats in DFSInputStream#blockSeekTo, but I think they should also be added in DFSInputStream#close. # Recommend you add a comment to DFSInputStream#getReadStatistics about how to use the API, i.e. that the stats will only be up-to-date after closing the DFSInputStream. # Recommend adding comments to DFSInputStream.ReadStatistics explaining the meaning of the various fields, i.e. that SCR bytes will count for both SCR and "local bytes", that total >= local >= SCR, that remote bytes read can be determined by total - local, etc. # For that matter, you might want to add a getRemoteBytesRead method to DFSInputStream.ReadStatistics to do the subtraction for the user. # Any thoughts about how this new feature should interact with the existing FileSystem#Statistics class? Valid answers include "not at all" and/or "this will be helpful as-is, we can think about that later." > provide client-side metrics for remote reads, local reads, and short-circuit > reads > ---------------------------------------------------------------------------------- > > Key: HDFS-4698 > URL: https://issues.apache.org/jira/browse/HDFS-4698 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client > Affects Versions: 2.0.3-alpha > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Priority: Minor > Attachments: HDFS-4698.001.patch > > > We should provide metrics to let clients know how many bytes of data they > have read remotely, versus locally or via short-circuit local reads. This > will allow clients to know how well they're doing at bringing the computation > to the data, which will be useful in evaluating placement policies and > cluster configurations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira