Re: Samza job killed by left orphaned on YARN

David Yu Wed, 18 May 2016 10:33:17 -0700

>From the NM log, I'm seeing:

2016-05-18 06:29:06,248 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_e01_1463512986427_0007_01_0000022016-05-18
06:29:06,265 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Application *application_1463512986427_0007* transitioned from RUNNING to
FINISHING_CONTAINERS_WAIT


(*Highlighted* is the particular samza application.)

The status never transitioned from FINISHING_CONTAINERS_WAIT :(



On Wed, May 18, 2016 at 10:21 AM, David Yu <david...@optimizely.com> wrote:

> Jacob,
>
> I have checked and made sure that NM is running on the node:
>
> $ ps aux | grep java
> ...
> yarn     25623  0.5  0.8 2366536 275488 ?      Sl   May17   7:04
> /usr/java/jdk1.8.0_51/bin/java -Dproc_nodemanager
>  ... org.apache.hadoop.yarn.server.nodemanager.NodeManager
>
>
>
> Thanks,
> David
>
> On Wed, May 18, 2016 at 7:08 AM, Jacob Maes <jacob.m...@gmail.com> wrote:
>
>> Hey David,
>>
>> The only time I've seen orphaned containers is when the NM dies. If the NM
>> isn't running, the RM has no means to kill the containers on a node. Can
>> you verify that the NM was healthy at the time of the shut down?
>>
>> If it wasn't healthy and/or it was restarted, one option that may help is
>> NM Recovery:
>>
>> https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html
>>
>> With NM Recovery, the NM will resume control over containers that were
>> running when the NM shut down. This option has virtually eliminated
>> orphaned containers in our clusters.
>>
>> -Jake
>>
>> On Tue, May 17, 2016 at 11:54 PM, David Yu <david...@optimizely.com>
>> wrote:
>>
>> > Samza version = 0.10.0
>> > YARN version = Hadoop 2.6.0-cdh5.4.9
>> >
>> > We are experience issues when killing a Samza job:
>> >
>> > $ yarn application -kill application_1463512986427_0007
>> >
>> > Killing application application_1463512986427_0007
>> >
>> > 16/05/18 06:29:05 INFO impl.YarnClientImpl: Killed application
>> > application_1463512986427_0007
>> >
>> > RM shows that the job is killed. However, the samza containers are still
>> > left running.
>> >
>> > Any idea why this is happening?
>> >
>> > Thanks,
>> > David
>> >
>>
>
>

Re: Samza job killed by left orphaned on YARN

Reply via email to