[jira] Commented: (HADOOP-2012) Periodic verification at the Datanode

Sameer Paranjpye (JIRA) Wed, 17 Oct 2007 12:59:41 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535705
 ]


Sameer Paranjpye commented on HADOOP-2012:
------------------------------------------

> The checksum is verified on the client, and failures there are reported back 
> to DFS

Are these failures reported to the Namenode or the Datanode? From what I know 
it's the Namenode, which I believe doesn't do anything with the reports other 
than logging the failure. It would be better to report the failure to the 
Datanode, have the Datanode validate what's on disk and report corruption to 
the Namenode if the validation fails.

> Periodic verification at the Datanode
> -------------------------------------
>
>                 Key: HADOOP-2012
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2012
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>
> Currently on-disk data corruption on data blocks is detected only when it is 
> read by the client or by another datanode.  These errors are detected much 
> earlier if datanode can periodically verify the data checksums for the local 
> blocks.
> Some of the issues to consider :
> - How should we check the blocks ( no more often than once every couple of 
> weeks ?)
> - How do we keep track of when a block was last verfied ( there is a .meta 
> file associcated with each lock ).
> - What action to take once a corruption is detected
> - Scanning should be done as a very low priority with rest of the datanode 
> disk traffic in mind.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2012) Periodic verification at the Datanode

Reply via email to