[ 
https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559110#action_12559110
 ] 

Raghu Angadi commented on HADOOP-2012:
--------------------------------------

Thanks Dhruba.

Regd 1) : Its not an extra RPC to Namenode. When a client receives the full 
block (say 64MB), it writes back 2 bytes to Datanode to say that checksum was 
ok, on the same connection. So its overhead is just 2 bytes. I think it is 
probably not required to run benchmarks before committing this, but we 
certainly can, I will talk to Mukund.  

2) yes, we should add new stats to Simon config.

3) well noted. Right now Periodic verification is quite seperated, merging 
BlockMap in FSDataset and here will take quite a few code changes (not 
complicated, but might look messy), we can certainly merge these when we want 
to reduce memory. I should save around 64 bytes per block on 64 bit JVM.


> Periodic verification at the Datanode
> -------------------------------------
>
>                 Key: HADOOP-2012
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2012
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, 
> HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, 
> HADOOP-2012.patch, HADOOP-2012.patch
>
>
> Currently on-disk data corruption on data blocks is detected only when it is 
> read by the client or by another datanode.  These errors are detected much 
> earlier if datanode can periodically verify the data checksums for the local 
> blocks.
> Some of the issues to consider :
> - How should we check the blocks ( no more often than once every couple of 
> weeks ?)
> - How do we keep track of when a block was last verfied ( there is a .meta 
> file associcated with each lock ).
> - What action to take once a corruption is detected
> - Scanning should be done as a very low priority with rest of the datanode 
> disk traffic in mind.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to