[ https://issues.apache.org/jira/browse/HDFS-6491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019629#comment-14019629 ]
LiuLei commented on HDFS-6491: ------------------------------ I think we can generate md5 checksum for block file of HDFS File, then compare the md5 checksum. If md5 checksum are same in two HDFS Clusters, the HDFS file content are same. > proposal for developing a tool to compare files/dirs > ---------------------------------------------------- > > Key: HDFS-6491 > URL: https://issues.apache.org/jira/browse/HDFS-6491 > Project: Hadoop HDFS > Issue Type: New Feature > Components: tools > Affects Versions: 2.4.0 > Reporter: Yongjun Zhang > Assignee: Yongjun Zhang > > We have a tool distcp that copy files similar to unix/linux cp command but > copy files in a distributed way, but we don't have a tool to compare > files/dirs. I think to provide such a tool would be helpful. We can name it > distdiff to be consistent with distcp. > > Right now I'm thinking about providing some basic functionality as a starting > point, and we can add more features or add performance improvement later. > I had opportunity to discuss this with [~daryn] and [~szetszwo] in person. > Thanks both of them a lot for the very valuable inputs. -- This message was sent by Atlassian JIRA (v6.2#6252)