[ https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raghu Angadi updated HADOOP-2012: --------------------------------- Attachment: HADOOP-2012.patch The next iteration of the patch that stores the verification in a log file (human readable text). The above comment is still valid, except the third bullet. The log file is managed in the following manner: - At any time there are upto two files in top level datanode directory : {{dncp_block_verification.log.curr}} and {{dncp_block_verification.log.prev}}. - New verifications times are appended to {{current}} file. Once it has {{5*num_of_blocks}} lines, it will be rolled (i.e. {{prev}} will be replaced by {{curr}}). - The log file is managed by new static class {{LogFileHandler}} in {{DataBlockScanner}}. It also provides an iterator. - Each line is about 80 bytes now. looke like: {noformat} date="2008-01-08 01:11:11,674" time="1199754671674" id="3144315593631418455"{noformat}. This is easily extendable. "date" is present just for readability and can be removed. - During upgrade these files are copied as opposed to blocks which are hard linked. {{DataStorate.java}} is modified to copy any file that starts with "dncp_". Review of this patch is much appreciated. I will add a test case in the meanwhile. > Periodic verification at the Datanode > ------------------------------------- > > Key: HADOOP-2012 > URL: https://issues.apache.org/jira/browse/HADOOP-2012 > Project: Hadoop > Issue Type: New Feature > Components: dfs > Reporter: Raghu Angadi > Assignee: Raghu Angadi > Fix For: 0.16.0 > > Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, > HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch > > > Currently on-disk data corruption on data blocks is detected only when it is > read by the client or by another datanode. These errors are detected much > earlier if datanode can periodically verify the data checksums for the local > blocks. > Some of the issues to consider : > - How should we check the blocks ( no more often than once every couple of > weeks ?) > - How do we keep track of when a block was last verfied ( there is a .meta > file associcated with each lock ). > - What action to take once a corruption is detected > - Scanning should be done as a very low priority with rest of the datanode > disk traffic in mind. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.