[ https://issues.apache.org/jira/browse/HDFS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450052#comment-13450052 ]
Marcelo Vanzin commented on HDFS-3889: -------------------------------------- bq. In the absence of CRCs, it should also be based on modtime and other file metadata, not just size. If the goal is to just provide the same functionality as rsync, then sure. Although I consider those less reliable (or just as bad) as file size alone. They require the metadata to be kept in sync between source and destination, something that I don't think is very common for mod time or access time, for example. > distcp overwrites files even when there are missing checksums > ------------------------------------------------------------- > > Key: HDFS-3889 > URL: https://issues.apache.org/jira/browse/HDFS-3889 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools > Affects Versions: 2.2.0-alpha > Reporter: Colin Patrick McCabe > Priority: Minor > > If distcp can't read the checksum files for the source and destination > files-- for any reason-- it ignores the checksums and overwrites the > destination file. It does produce a log message, but I think the correct > behavior would be to throw an error and stop the distcp. > If the user really wants to ignore checksums, he or she can use > {{-skipcrccheck}} to do so. > The relevant code is in DistCpUtils#checksumsAreEquals: > {code} > try { > sourceChecksum = sourceFS.getFileChecksum(source); > targetChecksum = targetFS.getFileChecksum(target); > } catch (IOException e) { > LOG.error("Unable to retrieve checksum for " + source + " or " + > target, e); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira