[ https://issues.apache.org/jira/browse/MAPREDUCE-648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tsz Wo (Nicholas), SZE updated MAPREDUCE-648: --------------------------------------------- Resolution: Fixed Status: Resolved (was: Patch Available) I have committed this. Thanks, Ravi! > Two distcp bugs > --------------- > > Key: MAPREDUCE-648 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-648 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp > Reporter: Ravi Gummadi > Assignee: Ravi Gummadi > Priority: Minor > Fix For: 0.21.0 > > Attachments: d_648_644.patch, d_dirCount648.patch, > d_dirCount648.v1.patch, d_dirCount_648.patch > > > h4. 1. distcp -update launches job when there is at least one dir in source > paths to be copied, even though there is nothing to copy. > HADOOP-5675 added fileCount > 0 to be checked to decide whether to launch > job. And HADOOP-5762 changed this to fileCount + dirCount > 0 to solve the > issue of empty directories not getting copied to destination. With -update, > dirCount is incremented without checking if that dir already exists at the > destination. So distcp job is launched because of dirCount > 0 even though > there is nothing to copy. Incrementing dirCount can be skipped if that dir > already exists at the destination in case of -update. > h4. 2. distcp doesn't skip copying file when we do -update on single file if > the destfile already exists. > When we do > hadoop distcp -update srcfilename destfilename > it seems to be comparing checksums of srcfilename and > destfilename/srcfilename and so skip is not done. It should compare checksums > of srcfilename and destfilename. > See also MAPREDUCE-644. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.