[ 
https://issues.apache.org/jira/browse/HDFS-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640059#comment-13640059
 ] 

Aaron T. Myers commented on HDFS-4698:
--------------------------------------

Patch looks pretty good to me, though I do believe the findbugs warning is 
legitimate.

A few little comments:

# It looks to me like the patch misses a place in DFSInputStream where it 
should be adding to the statistics before closing a BlockReader. Currently the 
patch only adds the stats in DFSInputStream#blockSeekTo, but I think they 
should also be added in DFSInputStream#close.
# Recommend you add a comment to DFSInputStream#getReadStatistics about how to 
use the API, i.e. that the stats will only be up-to-date after closing the 
DFSInputStream.
# Recommend adding comments to DFSInputStream.ReadStatistics explaining the 
meaning of the various fields, i.e. that SCR bytes will count for both SCR and 
"local bytes", that total >= local >= SCR, that remote bytes read can be 
determined by total - local, etc.
# For that matter, you might want to add a getRemoteBytesRead method to 
DFSInputStream.ReadStatistics to do the subtraction for the user.
# Any thoughts about how this new feature should interact with the existing 
FileSystem#Statistics class? Valid answers include "not at all" and/or "this 
will be helpful as-is, we can think about that later."
                
> provide client-side metrics for remote reads, local reads, and short-circuit 
> reads
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-4698
>                 URL: https://issues.apache.org/jira/browse/HDFS-4698
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 2.0.3-alpha
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HDFS-4698.001.patch
>
>
> We should provide metrics to let clients know how many bytes of data they 
> have read remotely, versus locally or via short-circuit local reads.  This 
> will allow clients to know how well they're doing at bringing the computation 
> to the data, which will be useful in evaluating placement policies and 
> cluster configurations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to