>From the NM log, I'm seeing: 2016-05-18 06:29:06,248 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_e01_1463512986427_0007_01_0000022016-05-18 06:29:06,265 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application *application_1463512986427_0007* transitioned from RUNNING to FINISHING_CONTAINERS_WAIT
(*Highlighted* is the particular samza application.) The status never transitioned from FINISHING_CONTAINERS_WAIT :( On Wed, May 18, 2016 at 10:21 AM, David Yu <david...@optimizely.com> wrote: > Jacob, > > I have checked and made sure that NM is running on the node: > > $ ps aux | grep java > ... > yarn 25623 0.5 0.8 2366536 275488 ? Sl May17 7:04 > /usr/java/jdk1.8.0_51/bin/java -Dproc_nodemanager > ... org.apache.hadoop.yarn.server.nodemanager.NodeManager > > > > Thanks, > David > > On Wed, May 18, 2016 at 7:08 AM, Jacob Maes <jacob.m...@gmail.com> wrote: > >> Hey David, >> >> The only time I've seen orphaned containers is when the NM dies. If the NM >> isn't running, the RM has no means to kill the containers on a node. Can >> you verify that the NM was healthy at the time of the shut down? >> >> If it wasn't healthy and/or it was restarted, one option that may help is >> NM Recovery: >> >> https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html >> >> With NM Recovery, the NM will resume control over containers that were >> running when the NM shut down. This option has virtually eliminated >> orphaned containers in our clusters. >> >> -Jake >> >> On Tue, May 17, 2016 at 11:54 PM, David Yu <david...@optimizely.com> >> wrote: >> >> > Samza version = 0.10.0 >> > YARN version = Hadoop 2.6.0-cdh5.4.9 >> > >> > We are experience issues when killing a Samza job: >> > >> > $ yarn application -kill application_1463512986427_0007 >> > >> > Killing application application_1463512986427_0007 >> > >> > 16/05/18 06:29:05 INFO impl.YarnClientImpl: Killed application >> > application_1463512986427_0007 >> > >> > RM shows that the job is killed. However, the samza containers are still >> > left running. >> > >> > Any idea why this is happening? >> > >> > Thanks, >> > David >> > >> > >