[jira] Commented: (HADOOP-3981) Need a distributed file checksum algorithm for HDFS

Doug Cutting (JIRA) Mon, 08 Sep 2008 11:44:35 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629256#action_12629256
 ]


Doug Cutting commented on HADOOP-3981:
--------------------------------------

> Which API are you talking about, FileSystem API or HDFS API?

FileSystem.  And that second method above should have been:
 - Checksum getChecksum(Path, String algorithm)

But this is probably overkill for now.  Let's just choose a single algorithm 
and put its parameters in the algorithm name for now.  We have to change the 
ClientDatanodeProtocol in any case.  We can either compute per-block checksums 
on the datanode, or send CRCs to the client and sum them there.  Let's just 
pick one or the other though for the first version.  My preference would be to 
compute per-block checksums on the datanode, but I don't feel that strongly and 
would not veto the other approach.


> Need a distributed file checksum algorithm for HDFS
> ---------------------------------------------------
>
>                 Key: HADOOP-3981
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3981
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>
> Traditional message digest algorithms, like MD5, SHA1, etc., require reading 
> the entire input message sequentially in a central location.  HDFS supports 
> large files with multiple tera bytes.  The overhead of reading the entire 
> file is huge. A distributed file checksum algorithm is needed for HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3981) Need a distributed file checksum algorithm for HDFS

Reply via email to