[ https://issues.apache.org/jira/browse/HDFS-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649876#comment-14649876 ]
Allen Wittenauer commented on HDFS-6682: ---------------------------------------- We have no insight into how old a given replication might have been hanging around so no way to really answer that question. We know it gets backed up during cascading DN failure events (thanks very slow NM memory checker+fast acting bad job+Linux OOM killer!), so I was always under the impression that it's just the whole queue is super busy vs. old ones never cleared. Rate might be useful to at least tell us if it is stuck and/or a project on how long the queue will remain behind. > Add a metric to expose the timestamp of the oldest under-replicated block > ------------------------------------------------------------------------- > > Key: HDFS-6682 > URL: https://issues.apache.org/jira/browse/HDFS-6682 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Akira AJISAKA > Assignee: Akira AJISAKA > Labels: metrics > Attachments: HDFS-6682.002.patch, HDFS-6682.003.patch, > HDFS-6682.004.patch, HDFS-6682.005.patch, HDFS-6682.006.patch, HDFS-6682.patch > > > In the following case, the data in the HDFS is lost and a client needs to put > the same file again. > # A Client puts a file to HDFS > # A DataNode crashes before replicating a block of the file to other DataNodes > I propose a metric to expose the timestamp of the oldest > under-replicated/corrupt block. That way client can know what file to retain > for the re-try. -- This message was sent by Atlassian JIRA (v6.3.4#6332)