[ 
https://issues.apache.org/jira/browse/HDFS-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649876#comment-14649876
 ] 

Allen Wittenauer commented on HDFS-6682:
----------------------------------------

We have no insight into how old a given replication might have been hanging 
around so no way to really answer that question.  We know it gets backed up 
during cascading DN failure events (thanks very slow NM memory checker+fast 
acting bad job+Linux OOM killer!), so I was always under the impression that 
it's just the whole queue is super busy vs. old ones never cleared.  Rate might 
be useful to at least tell us if it is stuck and/or a project on how long the 
queue will remain behind.

> Add a metric to expose the timestamp of the oldest under-replicated block
> -------------------------------------------------------------------------
>
>                 Key: HDFS-6682
>                 URL: https://issues.apache.org/jira/browse/HDFS-6682
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Akira AJISAKA
>            Assignee: Akira AJISAKA
>              Labels: metrics
>         Attachments: HDFS-6682.002.patch, HDFS-6682.003.patch, 
> HDFS-6682.004.patch, HDFS-6682.005.patch, HDFS-6682.006.patch, HDFS-6682.patch
>
>
> In the following case, the data in the HDFS is lost and a client needs to put 
> the same file again.
> # A Client puts a file to HDFS
> # A DataNode crashes before replicating a block of the file to other DataNodes
> I propose a metric to expose the timestamp of the oldest 
> under-replicated/corrupt block. That way client can know what file to retain 
> for the re-try.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to