[ 
https://issues.apache.org/jira/browse/HDFS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450075#comment-13450075
 ] 

Marcelo Vanzin commented on HDFS-3889:
--------------------------------------

bq. I believe that the modification time is set based on the NN, not the 
clients. So nothing needs to be kept in sync.

You have two NNs. The metadata on the the target NN needs to be in sync with 
the source NN for the metadata-based check to do the right thing.

In the end, my opinion is just that metadata-based checks are a very poor 
substitute for checksums, and can much more easily generate false positives 
(i.e. say that files are equal when they're not). But if it's a feature that 
people find useful, why not. The false negative case is not such a big problem, 
since it would just waste bandwidth by forcing the copy.
                
> distcp overwrites files even when there are missing checksums
> -------------------------------------------------------------
>
>                 Key: HDFS-3889
>                 URL: https://issues.apache.org/jira/browse/HDFS-3889
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 2.2.0-alpha
>            Reporter: Colin Patrick McCabe
>            Priority: Minor
>
> If distcp can't read the checksum files for the source and destination 
> files-- for any reason-- it ignores the checksums and overwrites the 
> destination file.  It does produce a log message, but I think the correct 
> behavior would be to throw an error and stop the distcp.
> If the user really wants to ignore checksums, he or she can use 
> {{-skipcrccheck}} to do so.
> The relevant code is in DistCpUtils#checksumsAreEquals:
> {code}
>     try {
>       sourceChecksum = sourceFS.getFileChecksum(source);
>       targetChecksum = targetFS.getFileChecksum(target);
>     } catch (IOException e) {
>       LOG.error("Unable to retrieve checksum for " + source + " or " + 
> target, e);
>     }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to