[jira] Updated: (HADOOP-1134) Block level CRCs in HDFS

Raghu Angadi (JIRA) Wed, 30 May 2007 09:48:37 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Raghu Angadi updated HADOOP-1134:
---------------------------------

    Attachment: bc-no-upgrade-05302007.patch

Current patch is attached. This is a VERY EXPERIMENTAL patch. A few notes:
 
 # Applies to trunk as of 05/30/07.
 # This will not start on existing DFS. You need first format with this patch.
 # DFS is fully functional and is not expected be any slower than current DFS. 
But this in no way close to being final patch.
 # There is a bug chunk of Upgrade related in DataNode.java and a little bit in 
FSNameSystem.java but it is not currently excuted.
 # {{DistributedFileSystem}} in trunk becomes {{ChecsumDistributedFileSystem}} 
with the patch. {{DistributeFileSystem}} in the patch does not create {{.crc}} 
files.
 # Core functionality and protocol is described in the html file attached with 
previous comment.
 # The important changes are in DFSClient.java and DataNode.java
 #  DFSClient.BlockReader class handles reading from a datanode and unpacks the 
data.
 # There are quite a few {{XXX}} comments and will be handled in later patches.


> Block level CRCs in HDFS
> ------------------------
>
>                 Key: HADOOP-1134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1134
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: bc-no-upgrade-05302007.patch, 
> DfsBlockCrcDesign-05305007.htm
>
>
> Currently CRCs are handled at FileSystem level and are transparent to core 
> HDFS. See recent improvement HADOOP-928 ( that can add checksums to a given 
> filesystem ) regd more about it. Though this served us well there a few 
> disadvantages :
> 1) This doubles namespace in HDFS ( or other filesystem implementations ). In 
> many cases, it nearly doubles the number of blocks. Taking namenode out of 
> CRCs would nearly double namespace performance both in terms of CPU and 
> memory.
> 2) Since CRCs are transparent to HDFS, it can not actively detect corrupted 
> blocks. With block level CRCs, Datanode can periodically verify the checksums 
> and report corruptions to namnode such that name replicas can be created.
> We propose to have CRCs maintained for all HDFS data in much the same way as 
> in GFS. I will update the jira with detailed requirements and design. This 
> will include same guarantees provided by current implementation and will 
> include a upgrade of current data.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1134) Block level CRCs in HDFS

Reply via email to