Hi Ternece,
Thank you for your response. I'm sure AM had received the request from the
TwillController. Here is the log:
2016-08-09T18:25:00,100Z INFO c.b.b.o.c.CppShellRunnable [cp01-yarn-test2]
[TwillContainerService] CppShellRunnable:run(CppShellRunnable.java:87) - sleep
instanceid 0
2016-08-09T18:25:00,080Z INFO c.b.b.o.c.CppShellRunnable [cp01-yarn-test3]
[TwillContainerService] CppShellRunnable:run(CppShellRunnable.java:87) - sleep
instanceid 1
2016-08-09T18:25:00,878Z INFO c.b.b.o.c.CppShellRunnable [cp01-yarn-test2]
[TwillContainerService] CppShellRunnable:run(CppShellRunnable.java:87) - sleep
instanceid 2
2016-08-09T18:25:00,907Z INFO c.b.b.o.c.CppShellRunnable [cp01-yarn-test3]
[runnable-command-executor]
CppShellRunnable:handleCommand(CppShellRunnable.java:98) - handle command
SimpleCommand{command='instances', options={count=2}} instanceid 1
2016-08-09T18:25:00,886Z INFO o.a.t.i.a.ApplicationMasterService
[cp01-yarn-test2] [message-callback]
ApplicationMasterService:handleSetInstances(ApplicationMasterService.java:735)
- Received change instances request for fetcher, from 3 to 2.
2016-08-09T18:25:00,888Z INFO o.a.t.i.a.ApplicationMasterService
[cp01-yarn-test2] [instanceChanger]
ApplicationMasterService$6:run(ApplicationMasterService.java:756) - Processing
change instance request for fetcher, from 3 to 2.
2016-08-09T18:25:00,890Z INFO o.a.t.i.a.ApplicationMasterService
[cp01-yarn-test2] [instanceChanger]
ApplicationMasterService$6:run(ApplicationMasterService.java:760) - Confirmed 3
containers running for fetcher.
2016-08-09T18:25:00,891Z INFO o.a.t.i.a.RunningContainers [cp01-yarn-test2]
[instanceChanger]
RunningContainers:removeInstanceById(RunningContainers.java:226) - Stopping
service: fetcher fbd0c443-d7b5-4292-a18b-144510c499c4-2
2016-08-09T18:25:00,919Z INFO o.a.t.i.a.ApplicationMasterService
[cp01-yarn-test2] [instanceChanger]
ApplicationMasterService$6:run(ApplicationMasterService.java:776) - Change
instances request completed. From 3 to 2.
2016-08-09T18:25:00,936Z INFO c.b.b.o.c.CppShellRunnable [cp01-yarn-test2]
[runnable-command-executor]
CppShellRunnable:handleCommand(CppShellRunnable.java:98) - handle command
SimpleCommand{command='instances', options={count=2}} instanceid 0
2016-08-09T18:25:01,100Z INFO c.b.b.o.c.CppShellRunnable [cp01-yarn-test2]
[TwillContainerService] CppShellRunnable:run(CppShellRunnable.java:87) - sleep
instanceid 0
2016-08-09T18:25:01,081Z INFO c.b.b.o.c.CppShellRunnable [cp01-yarn-test3]
[TwillContainerService] CppShellRunnable:run(CppShellRunnable.java:87) - sleep
instanceid 1
For fear of my work cannot be terminated, I just make it sleep in the Twill
Runnable and log the instance number. I've checked that the TwillLauncher
process is still running. I've no idea why this process not be killed.
Also, the restart api cannot kill the process either but just launch the new
instances.
Thanks!
Haosu Guo
------------------ ???????? ------------------
??????: "chtyim";<[email protected]>;
????????: 2016??8??10??(??????) ????1:50
??????: "...the end"<[email protected]>; "dev"<[email protected]>;
????: Re: A question about Twill changing instances number
Hi Haosu,
Do you have the application master log? It tells whether the AM actually
received the request and tries to terminate the container. Also, if you have
access to the cluster, please check if the actual container process is actually
terminated or not. We've seen cases that the container process is actually
still running due to unterminated user thread, hence causing the container
never returns back to YARN.
Ternece
On Tue, Aug 9, 2016 at 2:22 AM, ...the end <[email protected]> wrote:
hi Terence??
I'm a user of Apache Twill and now I have a question about changing the
instance number.
I'm using twill-incubating-0.7.0.0, yarn-2.6.4, zookeeper-3.4.6. When I gonna
to increase the number of instances, it runs well. But when I try to decrease
the instances, I think there is something wrong.
Here is the log:
16:48:53.365 [ STARTING-SendThread(cp01-yarn-test1:2181)] DEBUG
org.apache.zookeeper.ClientCnxn - Reading reply sessionid:0x155de15272f0105,
packet::
clientPath:/Cpp-Application/8047096c-0a25-40ec-8f21-ca8569c40f8c/messages/msg
serverPath:/Cpp-Application/8047096c-0a25-40ec-8f21-ca8569c40f8c/messages/msg
finished:false header:: 21,1 replyHeader:: 21,3320,0 request::
'/Cpp-Application/8047096c-0a25-40ec-8f21-ca8569c40f8c/messages/msg,#7b2274797065223a2253595354454d222c2273636f7065223a2252554e4e41424c45222c2272756e6e61626c654e616d65223a2266657463686572222c22636f6d6d616e64223a7b22636f6d6d616e64223a22696e7374616e636573222c226f7074696f6e73223a7b22636f756e74223a2233227d7d7d,v{s{31,s{'world,'anyone}}},2
response::
'/Cpp-Application/8047096c-0a25-40ec-8f21-ca8569c40f8c/messages/msg0000000001
16:48:53.369 [ STARTING-SendThread(cp01-yarn-test1:2181)] DEBUG
org.apache.zookeeper.ClientCnxn - Reading reply sessionid:0x155de15272f0105,
packet::
clientPath:/Cpp-Application/8047096c-0a25-40ec-8f21-ca8569c40f8c/messages/msg0000000001
serverPath:/Cpp-Application/8047096c-0a25-40ec-8f21-ca8569c40f8c/messages/msg0000000001
finished:false header:: 22,3 replyHeader:: 22,3320,0 request::
'/Cpp-Application/8047096c-0a25-40ec-8f21-ca8569c40f8c/messages/msg0000000001,T
response:: s{3320,3320,1470732533363,1470732533363,0,0,0,0,119,0,3320}
16:48:56.706 [ STARTING-SendThread(cp01-yarn-test1:2181)] DEBUG
org.apache.zookeeper.ClientCnxn - Got ping response for sessionid:
0x155de15272f0105 after 0ms
16:48:59.354 [ STARTING-SendThread(cp01-yarn-test1:2181)] DEBUG
org.apache.zookeeper.ClientCnxn - Got notification sessionid:0x155de15272f0105
16:48:59.354 [ STARTING-SendThread(cp01-yarn-test1:2181)] DEBUG
org.apache.zookeeper.ClientCnxn - Got WatchedEvent state:SyncConnected
type:NodeDeleted
path:/Cpp-Application/8047096c-0a25-40ec-8f21-ca8569c40f8c/messages/msg0000000001
for sessionid 0x155de15272f0105
But the containers Running numbers I get from the 'Nodes of the Cluster' page
from yarn is not decreased.
I use the api like this:
Future<Integer> future = twillController.changeInstances(name, num);
JsonObject result = new JsonObject();
try {
int newCount = future.get();
result.addProperty("status", 0);
result.addProperty("new_count", newCount);
} catch (InterruptedException | ExecutionException e) {
result.addProperty("status", -1);
result.addProperty("errMsg", e.getMessage());
LOG.error("set container number error", e.getMessage());
}
Do you have any idea about why this not work? Hoping for your response. Thank
you!
Haosu Guo