[ https://issues.apache.org/jira/browse/MAPREDUCE-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085447#comment-14085447 ]
Karthik Kambatla commented on MAPREDUCE-5968: --------------------------------------------- Patch looks mostly good. A couple of minor comments: # Do we need to set delWorkDir to false at both places? The latter is always executed and the former can be skipped. {code} + // promote the output to the final location + if (!localFs.rename(workDir, finalDir)) { + localFs.delete(workDir, true); + delWorkDir = false; + if (!localFs.exists(finalDir)) { + throw new IOException("Failed to promote distributed cache object " + + workDir + " to " + finalDir); + } + // someone else promoted first + return 0; + } + delWorkDir = false; {code} # I understand the "-work-" comes from how work directory name is generated. Can we create the work directory name in a method that can be accessed from both the production and test code so the test continues to be useful in the future. {code} + String workDir = destination.getParent().toString() + "-work-"; {code} > Work directory is not deleted in DistCache if Exception happen in > downloadCacheObject. > --------------------------------------------------------------------------------------- > > Key: MAPREDUCE-5968 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5968 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 > Affects Versions: 1.2.1 > Reporter: zhihai xu > Assignee: zhihai xu > Attachments: MAPREDUCE-5968.branch1.patch > > > Work directory is not deleted in DistCache if Exception happen in > downloadCacheObject. In downloadCacheObject, the cache file will be copied to > temporarily work directory first, then the work directory will be renamed to > the final directory. If IOException happens during the copy, the work > directory will not be deleted. This will cause garbage data left in local > disk cache. For example If the MR application use Distributed Cache to send a > very large Archive/file(50G), if the disk is full during the copy, then the > IOException will be triggered, the work directory will be not deleted or > renamed and the work directory will occupy a big chunk of disk space. -- This message was sent by Atlassian JIRA (v6.2#6252)