[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085447#comment-14085447
 ] 

Karthik Kambatla commented on MAPREDUCE-5968:
---------------------------------------------

Patch looks mostly good. A couple of minor comments: 
# Do we need to set delWorkDir to false at both places? The latter is always 
executed and the former can be skipped.
{code}
+      // promote the output to the final location
+      if (!localFs.rename(workDir, finalDir)) {
+        localFs.delete(workDir, true);
+        delWorkDir = false;
+        if (!localFs.exists(finalDir)) {
+          throw new IOException("Failed to promote distributed cache object " +
+                                workDir + " to " + finalDir);
+        }
+        // someone else promoted first
+        return 0;
+      }
+      delWorkDir = false;
{code}
# I understand the "-work-" comes from how work directory name is generated. 
Can we create the work directory name in a method that can be accessed from 
both the production and test code so the test continues to be useful in the 
future. 
{code}
+    String workDir = destination.getParent().toString() + "-work-";
{code}

> Work directory is not deleted in  DistCache if Exception happen in 
> downloadCacheObject.
> ---------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5968
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5968
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 1.2.1
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: MAPREDUCE-5968.branch1.patch
>
>
> Work directory is not deleted in  DistCache if Exception happen in 
> downloadCacheObject. In downloadCacheObject, the cache file will be copied to 
> temporarily work directory first, then the  work directory will be renamed to 
> the final directory. If IOException happens during the copy, the  work 
> directory will not be deleted. This will cause garbage data left in local 
> disk cache. For example If the MR application use Distributed Cache to send a 
> very large Archive/file(50G), if the disk is full during the copy, then the 
> IOException will be triggered, the work directory will be not deleted or 
> renamed and the work directory will occupy a big chunk of disk space.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to