[ 
https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539179
 ] 

Sameer Paranjpye commented on HADOOP-2012:
------------------------------------------

Why not have a scan period only?

The scan period defines a window in which every block that exists at the 
beginning of the window will be examined (barring blocks that are deleted). A 
Datanode would construct a schedule for examining blocks in a scan period with 
least recently examined blocks going first. New blocks would be scheduled in 
the next window. The schedule could be constructed by dividing a window into 
_scanperiod/n_ intervals, one interval per block. A Datanode would make a 
determination of how much bandwidth it needs to scan a block based on when the 
next block is scheduled.

This would guarantee that every block that exists at the beginning of a scan 
period is examined once in the scan period. It would also guarantee an upper 
bound of 2*scan period between 2 scans of a given block. This is also an upper 
bound on the amount of time that elapses before a new block is scanned. In both 
cases, the time elapsed will, in the average case, be close to scan period and 
approach 2*scan period if a large number of blocks are added in a window. These 
seem like reasonable guarantees.

It would make sense to have a reasonable upper bound on the amount of bandwidth 
used for scanning and emit a warning if this is not enough to examine all 
blocks in a scan period. So if someone set a scan period of 1 minute or 
something else silly the Datanode doesn't spend all its time scanning.




> Periodic verification at the Datanode
> -------------------------------------
>
>                 Key: HADOOP-2012
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2012
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, 
> HADOOP-2012.patch
>
>
> Currently on-disk data corruption on data blocks is detected only when it is 
> read by the client or by another datanode.  These errors are detected much 
> earlier if datanode can periodically verify the data checksums for the local 
> blocks.
> Some of the issues to consider :
> - How should we check the blocks ( no more often than once every couple of 
> weeks ?)
> - How do we keep track of when a block was last verfied ( there is a .meta 
> file associcated with each lock ).
> - What action to take once a corruption is detected
> - Scanning should be done as a very low priority with rest of the datanode 
> disk traffic in mind.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to