[
https://issues.apache.org/jira/browse/HADOOP-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629256#action_12629256
]
Doug Cutting commented on HADOOP-3981:
--------------------------------------
> Which API are you talking about, FileSystem API or HDFS API?
FileSystem. And that second method above should have been:
- Checksum getChecksum(Path, String algorithm)
But this is probably overkill for now. Let's just choose a single algorithm
and put its parameters in the algorithm name for now. We have to change the
ClientDatanodeProtocol in any case. We can either compute per-block checksums
on the datanode, or send CRCs to the client and sum them there. Let's just
pick one or the other though for the first version. My preference would be to
compute per-block checksums on the datanode, but I don't feel that strongly and
would not veto the other approach.
> Need a distributed file checksum algorithm for HDFS
> ---------------------------------------------------
>
> Key: HADOOP-3981
> URL: https://issues.apache.org/jira/browse/HADOOP-3981
> Project: Hadoop Core
> Issue Type: New Feature
> Components: dfs
> Reporter: Tsz Wo (Nicholas), SZE
>
> Traditional message digest algorithms, like MD5, SHA1, etc., require reading
> the entire input message sequentially in a central location. HDFS supports
> large files with multiple tera bytes. The overhead of reading the entire
> file is huge. A distributed file checksum algorithm is needed for HDFS.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.