[ 
https://issues.apache.org/jira/browse/HADOOP-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated HADOOP-4103:
---------------------------------

    Attachment: HADOOP-4103.patch


The patch for missing block alerts. A user can monitor this in multiple ways :

   # 'bin/hdfs dfsadmin -report' reports this count.
   # A warning is pasted in red on NameNode front page
   # new stat is added (for Simon, for e.g.). 
        ** Also added a stat to report size of corrupt replicas map
  
Once the alert is noticed, admin can run 'dfsadmin -metasave' to find out which 
specific blocks are missing. 'metasave' is improved a bit to list replica info 
for each block in 'neededReplication' list and the line for a missing blocks 
contains the word "MISSING".

This is a very non-intrusive change, thus fairly safe for backporting. No new 
state or data structures for NN to track.

> Alert for missing blocks
> ------------------------
>
>                 Key: HADOOP-4103
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4103
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.17.2
>            Reporter: Christian Kunz
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-4103.patch
>
>
> A whole bunch of datanodes became dead because of some network problems 
> resulting in  heartbeat timeouts although datanodes were fine.
> Many processes started to fail because of the corrupted filesystem.
> In order to catch and diagnose such problems faster the namenode should 
> detect the corruption automatically and provide a way to alert operations. At 
> the minimum it should show the fact of corruption on the GUI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to