[ https://issues.apache.org/jira/browse/HDFS-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407579#comment-13407579 ]
Hudson commented on HDFS-3170: ------------------------------ Integrated in Hadoop-Common-trunk-Commit #2427 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2427/]) HDFS-3170. Add more useful metrics for write latency. Contributed by Matthew Jacobs. (Revision 1357970) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1357970 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java > Add more useful metrics for write latency > ----------------------------------------- > > Key: HDFS-3170 > URL: https://issues.apache.org/jira/browse/HDFS-3170 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node > Affects Versions: 2.0.0-alpha > Reporter: Todd Lipcon > Assignee: Matthew Jacobs > Fix For: 2.0.1-alpha > > Attachments: hdfs-3170.txt, hdfs-3170.txt, hdfs-3170.txt > > > Currently, the only write-latency related metric we expose is the total > amount of time taken by opWriteBlock. This is practically useless, since (a) > different blocks may be wildly different sizes, and (b) if the writer is only > generating data slowly, it will make a block write take longer by no fault of > the DN. I would like to propose two new metrics: > 1) *flush-to-disk time*: count how long it takes for each call to flush an > incoming packet to disk (including the checksums). In most cases this will be > close to 0, as it only flushes to buffer cache, but if the backing block > device enters congested writeback, it can take much longer, which provides an > interesting metric. > 2) *round trip to downstream pipeline node*: track the round trip latency for > the part of the pipeline between the local node and its downstream neighbors. > When we add a new packet to the ack queue, save the current timestamp. When > we receive an ack, update the metric based on how long since we sent the > original packet. This gives a metric of the total RTT through the pipeline. > If we also include this metric in the ack to upstream, we can subtract the > amount of time due to the later stages in the pipeline and have an accurate > count of this particular link. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira