[
https://issues.apache.org/jira/browse/APEXCORE-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16106220#comment-16106220
]
Vlad Rozov commented on APEXCORE-770:
-------------------------------------
[~sanjaypujare] Yes, NPE is caused by allocatedContainer being null as the
container was already removed from allocatedContainers. Please add your
comments regarding static inner class to the PR.
> Application is killed due to NPE in ApplicationMaster
> -----------------------------------------------------
>
> Key: APEXCORE-770
> URL: https://issues.apache.org/jira/browse/APEXCORE-770
> Project: Apache Apex Core
> Issue Type: Bug
> Reporter: Vinay Bangalore Srikanth
> Assignee: Sandesh
>
> In my apex-application, I was trying to delete different containers ( except
> the app master ) randomly.
> The application got killed unexpectedly with the following exception -
> {noformat}
> 2017-07-25 11:24:51,681 WARN com.datatorrent.stram.StreamingAppMasterService:
> Failed to stop container container_e47_1499808956620_0716_01_000090
> org.apache.hadoop.yarn.exceptions.YarnException: Container
> container_e47_1499808956620_0716_01_000090 is neither started nor scheduled
> to start
> at
> org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:45)
> at
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl.stopContainerAsync(NMClientAsyncImpl.java:234)
> at
> com.datatorrent.stram.StreamingAppMasterService.sendContainerAskToRM(StreamingAppMasterService.java:1175)
> at
> com.datatorrent.stram.StreamingAppMasterService.execute(StreamingAppMasterService.java:865)
> at
> com.datatorrent.stram.StreamingAppMasterService.run(StreamingAppMasterService.java:671)
> at
> com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:106)
> 2017-07-25 11:24:51,681 INFO com.datatorrent.stram.StreamingAppMasterService:
> Requested stop container container_e47_1499808956620_0716_01_000090
> 2017-07-25 11:24:51,681 INFO
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl: Processing
> Event EventType: STOP_CONTAINER for Container
> container_e47_1499808956620_0716_01_000090
> 2017-07-25 11:24:51,681 INFO
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl: Container
> container_e47_1499808956620_0716_01_000090 is already stopped or failed
> 2017-07-25 11:24:51,686 INFO com.datatorrent.stram.StreamingContainerManager:
> Initiating recovery for
> [email protected]:8041
> 2017-07-25 11:24:51,686 INFO com.datatorrent.stram.StreamingContainerManager:
> Affected operators [PTOperator[id=38,name=passthrough,state=ACTIVE],
> PTOperator[id=105,name=passthrough.output#unifier,state=ACTIVE],
> PTOperator[id=97,name=console,state=ACTIVE],
> PTOperator[id=106,name=passthrough.output#unifier,state=ACTIVE],
> PTOperator[id=103,name=console,state=ACTIVE],
> PTOperator[id=107,name=passthrough.output#unifier,state=ACTIVE],
> PTOperator[id=100,name=console,state=ACTIVE],
> PTOperator[id=108,name=passthrough.output#unifier,state=ACTIVE],
> PTOperator[id=99,name=console,state=ACTIVE],
> PTOperator[id=109,name=passthrough.output#unifier,state=ACTIVE],
> PTOperator[id=101,name=console,state=ACTIVE],
> PTOperator[id=110,name=passthrough.output#unifier,state=ACTIVE],
> PTOperator[id=102,name=console,state=ACTIVE],
> PTOperator[id=111,name=passthrough.output#unifier,state=ACTIVE],
> PTOperator[id=98,name=console,state=ACTIVE],
> PTOperator[id=112,name=passthrough.output#unifier,state=ACTIVE],
> PTOperator[id=104,name=console,state=ACTIVE],
> PTOperator[id=68,name=randomGenerator.out#unifier,state=ACTIVE]]
> 2017-07-25 11:24:52,260 ERROR
> com.datatorrent.stram.StreamingContainerManager: Unknown container
> container_e47_1499808956620_0716_01_000090
> 2017-07-25 11:24:52,263 INFO com.datatorrent.stram.StreamingContainerParent:
> child msg: [container_e47_1499808956620_0716_01_000090] Exiting heartbeat
> loop.. context:
> PTContainer[id=38(container_e47_1499808956620_0716_01_000090),state=KILLED,operators=[PTOperator[id=38,name=passthrough,state=PENDING_DEPLOY],
> PTOperator[id=68,name=randomGenerator.out#unifier,state=PENDING_DEPLOY]]]
> 2017-07-25 11:24:52,697 INFO com.datatorrent.stram.ResourceRequestHandler:
> Strict anti-affinity = [] for container with operators
> PTOperator[id=38,name=passthrough,state=PENDING_DEPLOY],PTOperator[id=68,name=randomGenerator.out#unifier,state=PENDING_DEPLOY]
> 2017-07-25 11:24:52,698 INFO com.datatorrent.stram.ResourceRequestHandler:
> Found host null
> 2017-07-25 11:24:52,698 INFO
> com.datatorrent.stram.BlacklistBasedResourceRequestHandler: No node specific
> request
> 2017-07-25 11:24:53,710 INFO
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl: Replacing token for :
> node18.morado.com:8041
> 2017-07-25 11:24:53,710 INFO com.datatorrent.stram.StreamingAppMasterService:
> Got new container., containerId=container_e47_1499808956620_0716_02_000034,
> containerNode=node18.morado.com:8041,
> containerNodeURI=node18.morado.com:8042, containerResourceMemory4096,
> priority32
> 2017-07-25 11:24:53,710 INFO com.datatorrent.stram.StreamingContainerManager:
> Removing container agent container_e47_1499808956620_0716_01_000090
> 2017-07-25 11:24:53,711 INFO com.datatorrent.stram.LaunchContainerRunnable:
> Setting up container launch context for
> containerid=container_e47_1499808956620_0716_02_000034
> 2017-07-25 11:24:53,711 INFO com.datatorrent.stram.LaunchContainerRunnable:
> CLASSPATH:
> ./*:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:.
> 2017-07-25 11:24:53,946 INFO
> com.datatorrent.common.util.BasicContainerOptConfigurator: property map for
> operator {Generic=null, -Xmx=1920m}
> 2017-07-25 11:24:53,947 INFO
> com.datatorrent.common.util.BasicContainerOptConfigurator: property map for
> operator {Generic=null, -Xmx=768m}
> 2017-07-25 11:24:53,947 INFO com.datatorrent.stram.LaunchContainerRunnable:
> Jvm opts -Xmx3355443200 for container
> container_e47_1499808956620_0716_02_000034
> 2017-07-25 11:24:53,947 INFO com.datatorrent.stram.LaunchContainerRunnable:
> Launching on node: node18.morado.com:8041 command: $JAVA_HOME/bin/java
> -Xmx3355443200
> -Ddt.attr.APPLICATION_PATH=hdfs://node18.morado.com:8020/user/vinay/datatorrent/apps/application_1499808956620_0716
> -Djava.io.tmpdir=$PWD/tmp
> -Ddt.cid=container_e47_1499808956620_0716_02_000034
> -Dhadoop.root.logger=INFO,RFA -Dhadoop.log.dir=<LOG_DIR>
> -Dapex.application.name=$'SlowConsumerTimeoutWindowCountSet.apa'
> com.datatorrent.stram.engine.StreamingContainer 1><LOG_DIR>/stdout
> 2><LOG_DIR>/stderr
> 2017-07-25 11:24:53,947 INFO
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl: Processing
> Event EventType: START_CONTAINER for Container
> container_e47_1499808956620_0716_02_000034
> 2017-07-25 11:24:53,947 INFO com.datatorrent.stram.StreamingAppMasterService:
> Completed containerId=container_e47_1499808956620_0716_01_000090,
> state=COMPLETE, exitStatus=0, diagnostics=
> 2017-07-25 11:24:53,947 INFO com.datatorrent.stram.StreamingAppMasterService:
> Container completed successfully.,
> containerId=container_e47_1499808956620_0716_01_000090
> 2017-07-25 11:24:53,947 INFO
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy:
> Opening proxy : node18.morado.com:8041
> 2017-07-25 11:24:53,948 ERROR com.datatorrent.stram.StreamingAppMaster:
> Exiting Application Master
> java.lang.NullPointerException
> at
> com.datatorrent.stram.StreamingAppMasterService$AllocatedContainer.access$1000(StreamingAppMasterService.java:1251)
> at
> com.datatorrent.stram.StreamingAppMasterService.execute(StreamingAppMasterService.java:1014)
> at
> com.datatorrent.stram.StreamingAppMasterService.run(StreamingAppMasterService.java:671)
> at
> com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:106)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)