[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767702#comment-13767702
 ] 

Xi Fang commented on MAPREDUCE-5508:
------------------------------------

Thanks [~sandyr] and [~cnauroth]. Actually, the above discussion made me have 
second thoughts on the patch attached. There is a race condition here. Supposed 
that Path#getFileSystem in CleanupQueue#deletePath retrieved the same instance 
of JobInProgress#fs from FileSystem#Cache as well. Because there is race 
condition between DistributedFileSystem#close() and FileSystem#close(), it is 
possible that at the most just after JobInProgress#cleanupJob closed 
JobInProgress#fs's DFSClient, the processor switched to CleanupQueue#deletePath 
and called fs.delete(). Because this fs's DFCClient has been closed, an 
exception would be thrown and this staging directory won't be deleted then.


                
> JobTracker memory leak caused by unreleased FileSystem objects in 
> JobInProgress#cleanupJob
> ------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5508
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 1-win, 1.2.1
>            Reporter: Xi Fang
>            Assignee: Xi Fang
>            Priority: Critical
>         Attachments: MAPREDUCE-5508.patch
>
>
> MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
> object (see "tempDirFs") that is not properly released.
> {code} JobInProgress#cleanupJob()
>   void cleanupJob() {
> ...
>           tempDirFs = jobTempDirPath.getFileSystem(conf);
>           CleanupQueue.getInstance().addToQueue(
>               new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
> ...
>  if (tempDirFs != fs) {
>       try {
>         fs.close();
>       } catch (IOException ie) {
> ...
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to