Hi Haosu, Do you have the application master log? It tells whether the AM actually received the request and tries to terminate the container. Also, if you have access to the cluster, please check if the actual container process is actually terminated or not. We've seen cases that the container process is actually still running due to unterminated user thread, hence causing the container never returns back to YARN.
Ternece On Tue, Aug 9, 2016 at 2:22 AM, ...the end <549198...@qq.com> wrote: > hi Terence, > > I'm a user of Apache Twill and now I have a question about changing the > instance number. > > I'm using twill-incubating-0.7.0.0, yarn-2.6.4, zookeeper-3.4.6. When I > gonna to increase the number of instances, it runs well. But when I try to > decrease the instances, I think there is something wrong. > > Here is the log: > 16:48:53.365 [ STARTING-SendThread(cp01-yarn-test1:2181)] DEBUG > org.apache.zookeeper.ClientCnxn - Reading reply > sessionid:0x155de15272f0105, packet:: clientPath:/Cpp-Application/ > 8047096c-0a25-40ec-8f21-ca8569c40f8c/messages/msg > serverPath:/Cpp-Application/8047096c-0a25-40ec-8f21-ca8569c40f8c/messages/msg > finished:false header:: 21,1 replyHeader:: 21,3320,0 request:: > '/Cpp-Application/8047096c-0a25-40ec-8f21-ca8569c40f8c/messages/msg,# > 7b2274797065223a2253595354454d222c2273636f7065223a2252554e4e > 41424c45222c2272756e6e61626c654e616d65223a226665746368657222 > 2c22636f6d6d616e64223a7b22636f6d6d616e64223a22696e7374616e63 > 6573222c226f7074696f6e73223a7b22636f756e74223a2233227d7d7d,v{s{31,s{'world,'anyone}}},2 > response:: '/Cpp-Application/8047096c-0a25-40ec-8f21-ca8569c40f8c/ > messages/msg0000000001 > 16:48:53.369 [ STARTING-SendThread(cp01-yarn-test1:2181)] DEBUG > org.apache.zookeeper.ClientCnxn - Reading reply > sessionid:0x155de15272f0105, packet:: clientPath:/Cpp-Application/ > 8047096c-0a25-40ec-8f21-ca8569c40f8c/messages/msg0000000001 > serverPath:/Cpp-Application/8047096c-0a25-40ec-8f21-ca8569c40f8c/messages/msg0000000001 > finished:false header:: 22,3 replyHeader:: 22,3320,0 request:: > '/Cpp-Application/8047096c-0a25-40ec-8f21-ca8569c40f8c/messages/msg0000000001,T > response:: s{3320,3320,1470732533363,1470732533363,0,0,0,0,119,0,3320} > 16:48:56.706 [ STARTING-SendThread(cp01-yarn-test1:2181)] DEBUG > org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: > 0x155de15272f0105 after 0ms > 16:48:59.354 [ STARTING-SendThread(cp01-yarn-test1:2181)] DEBUG > org.apache.zookeeper.ClientCnxn - Got notification > sessionid:0x155de15272f0105 > 16:48:59.354 [ STARTING-SendThread(cp01-yarn-test1:2181)] DEBUG > org.apache.zookeeper.ClientCnxn - Got WatchedEvent state:SyncConnected > type:NodeDeleted path:/Cpp-Application/8047096c-0a25-40ec-8f21- > ca8569c40f8c/messages/msg0000000001 for sessionid 0x155de15272f0105 > > But the containers Running numbers I get from the 'Nodes of the Cluster' > page from yarn is not decreased. > > I use the api like this: > > Future<Integer> future = twillController.changeInstances(name, num > ); > > JsonObject result = new JsonObject(); > > try { > > int newCount = future.get(); > > result.addProperty("status", 0); > > result.addProperty("new_count", newCount); > > } catch (InterruptedException | ExecutionException e) { > > result.addProperty("status", -1); > > result.addProperty("errMsg", e.getMessage()); > > LOG.error("set container number error", e.getMessage()); > > } > > > Do you have any idea about why this not work? Hoping for your response. > Thank you! > > Haosu Guo >