[
https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raghu Angadi updated HADOOP-2012:
---------------------------------
Attachment: HADOOP-2012.patch
The next iteration of the patch that stores the verification in a log file
(human readable text).
The above comment is still valid, except the third bullet.
The log file is managed in the following manner:
- At any time there are upto two files in top level datanode directory :
{{dncp_block_verification.log.curr}} and {{dncp_block_verification.log.prev}}.
- New verifications times are appended to {{current}} file. Once it has
{{5*num_of_blocks}} lines, it will be rolled (i.e. {{prev}} will be replaced by
{{curr}}).
- The log file is managed by new static class {{LogFileHandler}} in
{{DataBlockScanner}}. It also provides an iterator.
- Each line is about 80 bytes now. looke like: {noformat} date="2008-01-08
01:11:11,674" time="1199754671674" id="3144315593631418455"{noformat}.
This is easily extendable. "date" is present just for readability and can be
removed.
- During upgrade these files are copied as opposed to blocks which are hard
linked. {{DataStorate.java}} is modified to copy any file that starts with
"dncp_".
Review of this patch is much appreciated. I will add a test case in the
meanwhile.
> Periodic verification at the Datanode
> -------------------------------------
>
> Key: HADOOP-2012
> URL: https://issues.apache.org/jira/browse/HADOOP-2012
> Project: Hadoop
> Issue Type: New Feature
> Components: dfs
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Fix For: 0.16.0
>
> Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch,
> HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch
>
>
> Currently on-disk data corruption on data blocks is detected only when it is
> read by the client or by another datanode. These errors are detected much
> earlier if datanode can periodically verify the data checksums for the local
> blocks.
> Some of the issues to consider :
> - How should we check the blocks ( no more often than once every couple of
> weeks ?)
> - How do we keep track of when a block was last verfied ( there is a .meta
> file associcated with each lock ).
> - What action to take once a corruption is detected
> - Scanning should be done as a very low priority with rest of the datanode
> disk traffic in mind.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.