[ 
https://issues.apache.org/jira/browse/HADOOP-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812539#comment-16812539
 ] 

Jim Brennan commented on HADOOP-15372:
--------------------------------------

[~miklos.szeg...@cloudera.com], [~ebadger], I recently debugged a case where we 
were (still) leaking tmp dirs for localized tarballs in our 2.8 code.  The 
problem turned out to be not that we were failing to kill all the shells, but 
that we were only killing the first subshell in the tar command, which was: 
{{gzip -dc inFile | ( cd untarDir; tar -xf)}}
When I went to attempt to reproduce the problem in 3.x (trunk), I was unable to 
get it to happen.
I believe this was fixed by YARN-2185, which changed the localization code to 
use runCommandOnStream().  Because there are threads for the input/output of 
the shell command, it is killed when the threads are killed.

So I think this Jira can be closed.  Do you guys agree?

> Race conditions and possible leaks in the Shell class
> -----------------------------------------------------
>
>                 Key: HADOOP-15372
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15372
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.10.0, 3.2.0
>            Reporter: Miklos Szegedi
>            Assignee: Eric Badger
>            Priority: Minor
>         Attachments: HADOOP-15372.001.patch
>
>
> YARN-5641 introduced some cleanup code in the Shell class. It has a race 
> condition. {{Shell.runCommand()}} can be called while/after 
> {{Shell.getAllShells()}} returned all the shells to be cleaned up. This new 
> thread can avoid the clean up, so that the process held by it can be leaked 
> causing leaked localized files/etc.
> I see another issue as well. {{Shell.runCommand()}} has a finally block with 
> a {{process.destroy();}} to clean up. However, the try catch block does not 
> cover all instructions after the process is started, so for example we can 
> exit the thread and leak the process, if 
> {{timeOutTimer.schedule(timeoutTimerTask, timeOutInterval);}} causes an 
> exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to