The calling graph is very useful. Thanks, Vinod.

I traced the code and enabled debugging log. I found one thing interesting
here.

While running the AM, I "ps aux | grep SampleAM". I found two running
processes.

34990
 /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java
SampleAM
34984  /bin/bash -c
/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java
SampleAM
1>/tmp/logs/application_1346348588670_0011/container_1346348588670_0011_01_000001/stdout
2>/tmp/logs/application_1346348588670_0011/container_1346348588670_0011_01_000001/stderr

After killing, in the NM log, I found following.

12/08/30 13:29:27,542 DEBUG [AsyncDispatcher event handler]
nodemanager.DefaultContainerExecutor: Sending signal 15 to pid 34984 as
user bo.wang
12/08/30 13:29:27,836 DEBUG [Task killer for 34984]
nodemanager.DefaultContainerExecutor: Sending signal 9 to pid 34984 as user
bo.wang

It looks like NM is only killing process 34984, but not 34990. As a result,
after killing, process 34990 is still running.

Is this a bug? BTW, I am running on my Macbook, which may be the reason
YARN is using DefaultContainerExecutor rather than LinuxContainerExecutor.

Thanks,
Bo

On Wed, Aug 29, 2012 at 5:23 PM, Vinod Kumar Vavilapalli <
[email protected]> wrote:

>
> Please attach your jstack dump, may be I can spot something.
>
> Pointer for what you asked: ContainerManagerImpl.stopContainer() ->
> ContainerImpl.KillTransition -> ContainersLauncher ->
> ContainerLaunch.cleanupContainer(). Follow the events carefully.
>
> HTH,
> +Vinod
>
> On Aug 29, 2012, at 3:28 PM, Bo Wang wrote:
>
> > Hi Vinod,
> >
> > Thanks for the suggestion. I was involved with some other issues before
> > getting back to this one. Sorry for replying late.
> >
> > I tried to kill the process with "kill -3" but it was not interrupted.
> Then
> > I used "kill -9" which sent a SIGKILL and the process was killed. I
> checked
> > the stderr and used jstack to dump the stack trace. Things look just
> > normal. Actually, I simplified my test AM to be just an empty while loop.
> >
> > I look into the code to find where the SIGKILL is sent in YARN but didn't
> > find it. I traced down to NodeManager.stopContainer, but didn't see that.
> > Would you mind sending me a pointer to the actual code?
> >
> > Thanks,
> > Bo
> >
> > On Wed, Aug 22, 2012 at 7:29 PM, Vinod Kumar Vavilapalli <
> > [email protected]> wrote:
> >
> >>
> >>> I am not sure when to grab the stack trace of the AM. In the
> >> stdout/stderr
> >>> of AM, no stack trace (or exception) is emitted.
> >>
> >>
> >> You can login to the node and if the process is still alive, you can do
> a
> >> "kill -3" which will dump the threads' status to stderr.
> >>
> >>
> >>> Btw, I am curious how NM kills a container. Does it directly kill the
> JVM
> >>> process?
> >>
> >>
> >> NM directly kills the JVM with a SIGTERM followed by a SIGKILL.
> >>
> >> BTW, please also check the corresponding NM's logs if there is some
> >> exception/error which could mean a bug in NM code.
> >>
> >> HTH,
> >> +Vinod
>
>

Reply via email to