[
https://issues.apache.org/jira/browse/HADOOP-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raghu Angadi updated HADOOP-4103:
---------------------------------
Attachment: HADOOP-4103.patch
The patch for missing block alerts. A user can monitor this in multiple ways :
# 'bin/hdfs dfsadmin -report' reports this count.
# A warning is pasted in red on NameNode front page
# new stat is added (for Simon, for e.g.).
** Also added a stat to report size of corrupt replicas map
Once the alert is noticed, admin can run 'dfsadmin -metasave' to find out which
specific blocks are missing. 'metasave' is improved a bit to list replica info
for each block in 'neededReplication' list and the line for a missing blocks
contains the word "MISSING".
This is a very non-intrusive change, thus fairly safe for backporting. No new
state or data structures for NN to track.
> Alert for missing blocks
> ------------------------
>
> Key: HADOOP-4103
> URL: https://issues.apache.org/jira/browse/HADOOP-4103
> Project: Hadoop Core
> Issue Type: New Feature
> Components: dfs
> Affects Versions: 0.17.2
> Reporter: Christian Kunz
> Assignee: Raghu Angadi
> Attachments: HADOOP-4103.patch
>
>
> A whole bunch of datanodes became dead because of some network problems
> resulting in heartbeat timeouts although datanodes were fine.
> Many processes started to fail because of the corrupted filesystem.
> In order to catch and diagnose such problems faster the namenode should
> detect the corruption automatically and provide a way to alert operations. At
> the minimum it should show the fact of corruption on the GUI.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.