[ https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533809 ]
dhruba borthakur commented on HADOOP-2012: ------------------------------------------ Thinking more about this one, I agree with Rob that it is important to have an algorithm that eventually verifies all blocks even in the face of frequent datanode restarts. In fact, if we want the datanode to scale to hundred thousand blocks, then this algorithm is essential. Instead of storing the last modification time of each block, can we have some other algorithm where each block's metadata need not be updated everytime a block is verified? How about if we start verifying blocks in increasing blockid order and record the current blockid that was verified? Maybe we need to persist this information only once every 100 blocks or so. If we reach the largest known blockid then we cycle back to the lowest blockid and start verifying from there. For a datanode that has 100K blocks, it will take only about 1MB of memory to keep a lazily-sorted list of blockids. > Periodic verification at the Datanode > ------------------------------------- > > Key: HADOOP-2012 > URL: https://issues.apache.org/jira/browse/HADOOP-2012 > Project: Hadoop > Issue Type: New Feature > Components: dfs > Reporter: Raghu Angadi > Assignee: Raghu Angadi > > Currently on-disk data corruption on data blocks is detected only when it is > read by the client or by another datanode. These errors are detected much > earlier if datanode can periodically verify the data checksums for the local > blocks. > Some of the issues to consider : > - How should we check the blocks ( no more often than once every couple of > weeks ?) > - How do we keep track of when a block was last verfied ( there is a .meta > file associcated with each lock ). > - What action to take once a corruption is detected > - Scanning should be done as a very low priority with rest of the datanode > disk traffic in mind. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.