hmm, I think it should be a bug that executor didn't reap the task status after I checked src/launcher/executor.cpp .
On Sat, Jun 4, 2016 at 6:50 PM, Tomek Janiszewski <[email protected]> wrote: > Too late. Sandbox was collected by GC. > > sob., 4.06.2016 o 12:45 użytkownik haosdent <[email protected]> napisał: > > > Usually executor would terminate itself if it reap the task status is > > killed or finished. > > Otherwise the reap callback have not yet registered not our executor has > > bug when > > reap task status. Could you find something in the executor stdout/stderr > ? > > > > On Sat, Jun 4, 2016 at 6:08 PM, Tomek Janiszewski <[email protected]> > > wrote: > > > > > Thanks. I just manually find that executor pid and killed it. Any idea > > why > > > it was still running without tasks? > > > > > > sob., 4.06.2016, 05:35 użytkownik haosdent <[email protected]> > napisał: > > > > > > > > 13:33:39.031054 [slave.cpp:2643] Got registration for executor > > > > 'service.a3b609b8-27ec-11e6-8044-02c89eb9127e' of framework > > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 from executor(1)@ > > > > 10.55.97.170:60083 > > > > > > > > Yes, according to your log, your executor is still running. If your > > > > executor is http_command_executor, > > > > you could use > > > > > > > > > > > > > > https://github.com/apache/mesos/blob/master/docs/executor-http-api.md#shutdown > > > > to shutdown it. > > > > If it is other type executor, seems don't have a api to shutdown > > executor > > > > as I know. Not sure whether kill the executor in > > > > Agent could resolve your problem or not. > > > > > > > > On Fri, Jun 3, 2016 at 4:33 PM, Tomek Janiszewski <[email protected] > > > > > > wrote: > > > > > > > > > Here is truncated response from slave(1)/state > > > > > > > > > > { > > > > > "attributes": {...}, > > > > > "completed_frameworks": [], > > > > > "flags": {...}, > > > > > "frameworks": [ > > > > > { > > > > > "checkpoint": true, > > > > > "completed_executors": [...], > > > > > "executors": [ > > > > > { > > > > > "queued_tasks": [], > > > > > "tasks": [], > > > > > "completed_tasks": [ > > > > > { > > > > > "discovery": {...}, > > > > > "executor_id": "", > > > > > "framework_id": > > > > > "f65b163c-0faf-441f-ac14-91739fa4394c-0000", > > > > > "id": > > > > > "service.a3b609b8-27ec-11e6-8044-02c89eb9127e", > > > > > "labels": [...], > > > > > "name": "service", > > > > > "resources": {...}, > > > > > "slave_id": > > > > > "ef232fd9-5114-4d8f-adc3-1669c1e6fdc5-S13", > > > > > "state": "TASK_KILLED", > > > > > "statuses": [] > > > > > } > > > > > ], > > > > > "container": > > "ead42e63-ac92-4ad0-a99c-4af9c3fa5e31", > > > > > "directory": "...", > > > > > "id": > > "service.a3b609b8-27ec-11e6-8044-02c89eb9127e", > > > > > "name": "Command Executor (Task: > > > > > service.a3b609b8-27ec-11e6-8044-02c89eb9127e) (Command: sh -c 'cd > > > > > service...')", > > > > > "resources": {...}, > > > > > "source": > > > > "service.a3b609b8-27ec-11e6-8044-02c89eb9127e" > > > > > > > > > > }, > > > > > ... > > > > > ], > > > > > } > > > > > ], > > > > > "git_sha": "961edbd82e691a619a4c171a7aadc9c32957fa73", > > > > > "git_tag": "0.28.0", > > > > > "version": "0.28.0", > > > > > ... > > > > > } > > > > > > > > > > Here is the log for this container: > > > > > > > > > > > 13:33:19.479182 [slave.cpp:1361] Got assigned task > > > > > service.a3b609b8-27ec-11e6-8044-02c89eb9127e for framework > > > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 > > > > > > 13:33:19.482566 [slave.cpp:1480] Launching task > > > > > service.a3b609b8-27ec-11e6-8044-02c89eb9127e for framework > > > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 > > > > > > 13:33:19.483921 [paths.cpp:528] Trying to chown > > > > > > > > > > > > > > > > > > > > '/tmp/mesos/slaves/ef232fd9-5114-4d8f-adc3-1669c1e6fdc5-S13/frameworks/f65b163c-0faf-441f-ac14-91739fa4394c-0000/executors/service.a3b609b8-27ec-11e6-8044-02c89eb9127e/runs/ead42e63-ac92-4ad0-a99c-4af9c3fa5e31' > > > > > to user 'mesosuser' > > > > > > 13:33:19.504173 [slave.cpp:5367] Launching executor > > > > > service.a3b609b8-27ec-11e6-8044-02c89eb9127e of framework > > > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 with resources > cpus(*):0.1; > > > > > mem(*):32 in work directory > > > > > > > > > > > > > > > > > > > > '/tmp/mesos/slaves/ef232fd9-5114-4d8f-adc3-1669c1e6fdc5-S13/frameworks/f65b163c-0faf-441f-ac14-91739fa4394c-0000/executors/service.a3b609b8-27ec-11e6-8044-02c89eb9127e/runs/ead42e63-ac92-4ad0-a99c-4af9c3fa5e31' > > > > > > 13:33:19.505537 [containerizer.cpp:666] Starting container > > > > > 'ead42e63-ac92-4ad0-a99c-4af9c3fa5e31' for executor > > > > > 'service.a3b609b8-27ec-11e6-8044-02c89eb9127e' of framework > > > > > 'f65b163c-0faf-441f-ac14-91739fa4394c-0000' > > > > > > 13:33:19.505734 [slave.cpp:1698] Queuing task > > > > > 'service.a3b609b8-27ec-11e6-8044-02c89eb9127e' for executor > > > > > 'service.a3b609b8-27ec-11e6-8044-02c89eb9127e' of framework > > > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 > > > > > ... > > > > > > 13:33:19.977483 [containerizer.cpp:1118] Checkpointing > executor's > > > > forked > > > > > pid 25576 to > > > > > > > > > > > > > > > > > > > > '/tmp/mesos/meta/slaves/ef232fd9-5114-4d8f-adc3-1669c1e6fdc5-S13/frameworks/f65b163c-0faf-441f-ac14-91739fa4394c-0000/executors/service.a3b609b8-27ec-11e6-8044-02c89eb9127e/runs/ead42e63-ac92-4ad0-a99c-4af9c3fa5e31/pids/forked.pid' > > > > > > 13:33:35.775195 [slave.cpp:1891] Asked to kill task > > > > > service.a3b609b8-27ec-11e6-8044-02c89eb9127e of framework > > > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 > > > > > > 13:33:35.775645 [slave.cpp:3002] Handling status update > > TASK_KILLED > > > > > (UUID: eba64915-7df2-483d-8982-a9a46a48a81b) for task > > > > > service.a3b609b8-27ec-11e6-8044-02c89eb9127e of framework > > > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 f > > > > > rom @0.0.0.0:0 > > > > > > 13:33:35.778105 [cpushare.cpp:389] Updated 'cpu.shares' to 102 > > (cpus > > > > > 0.1) for container ead42e63-ac92-4ad0-a99c-4af9c3fa5e31 > > > > > > 13:33:35.778488 [disk.cpp:169] Updating the disk resources for > > > > container > > > > > ead42e63-ac92-4ad0-a99c-4af9c3fa5e31 to cpus(*):0.1 > > > > > ; mem(*):32 > > > > > > 13:33:35.780349 [mem.cpp:353] Updated > 'memory.soft_limit_in_bytes' > > > to > > > > > 32MB for container ead42e63-ac92-4ad0-a99c-4af9c3fa5e3 > > > > > 1 > > > > > > 13:33:35.782573 [status_update_manager.cpp:320] Received status > > > update > > > > > TASK_KILLED (UUID: eba64915-7df2-483d-8982-a9a46a48a8 > > > > > 1b) for task service.a3b609b8-27ec-11e6-8044-02c89eb9127e of > > framework > > > > > f65b163c-0faf-441f-ac14-9173 > > > > > 9fa4394c-0000 > > > > > > 13:33:35.783860 [status_update_manager.cpp:824] Checkpointing > > UPDATE > > > > for > > > > > status update TASK_KILLED (UUID: > > eba64915-7df2-483d-8982-a9a46a48a81b) > > > > for > > > > > task service.a3b609b8-27ec-11e6-8044-02c89eb9127e of framework > > > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 > > > > > > 13:33:35.788767 [slave.cpp:3400] Forwarding the update > TASK_KILLED > > > > > (UUID: eba64915-7df2-483d-8982-a9a46a48a81b) for task > > > > > service.a3b609b8-27ec-11e6-8044-02c89eb9127e of framework > > > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 to > > [email protected]:5050 > > > > > > 13:33:35.917932 [status_update_manager.cpp:392] Received status > > > update > > > > > acknowledgement (UUID: eba64915-7df2-483d-8982-a9a46a48a81b) for > task > > > > > service.a3b609b8-27ec-11e6-8044-02c89eb9127e of framework > > > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 > > > > > > 13:33:35.918143 [status_update_manager.cpp:824] Checkpointing > ACK > > > for > > > > > status update TASK_KILLED (UUID: > > eba64915-7df2-483d-8982-a9a46a48a81b) > > > > for > > > > > task service.a3b609b8-27ec-11e6-8044-02c89eb9127e of framework > > > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 > > > > > ... > > > > > > 13:33:39.031054 [slave.cpp:2643] Got registration for executor > > > > > 'service.a3b609b8-27ec-11e6-8044-02c89eb9127e' of framework > > > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 from executor(1)@ > > > > > 10.55.97.170:60083 > > > > > > > > > > > > > > > Visible container is no longer running but it appears as running. > > What > > > > > should I do with it? > > > > > > > > > > Thanks > > > > > Tomek > > > > > > > > > > > > > > > czw., 2.06.2016 o 15:55 użytkownik Tomek Janiszewski < > > > [email protected]> > > > > > napisał: > > > > > > > > > > > Yes. I see dead executor in executors. It's tasks and > queued_tasks > > > are > > > > > > empty but there is one task in completed_tasks. > > > > > frameworks.completed_executors > > > > > > are filled with other executors. > > > > > > > > > > > > czw., 2.06.2016 o 15:39 użytkownik haosdent <[email protected]> > > > > > napisał: > > > > > > > > > > > >> Hi, @janiszt Seems the completed executors only exists > > > > > >> in completed_frameworks.completed_executors > > > > > >> or frameworks.completed_executors in my side. > > > > > >> > > > > > >> In your side, does completed_executors exists in any other > fields? > > > > > >> > > > > > >> On Thu, Jun 2, 2016 at 5:39 PM, Tomek Janiszewski < > > > [email protected]> > > > > > >> wrote: > > > > > >> > > > > > >> > Hi > > > > > >> > > > > > > >> > I'm running Mesos 0.28.0. Mesos slave(1)/state endpoint > returns > > > some > > > > > >> > completed executors not in frameworks.completed_executors but > in > > > > > >> > frameworks. > > > > > >> > executors. > > > > > >> > Is it normal behavior? How to force Mesos to move completed > > > > > >> > executors into frameworks.executors? > > > > > >> > > > > > > >> > Thanks > > > > > >> > Tomek > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> -- > > > > > >> Best Regards, > > > > > >> Haosdent Huang > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Best Regards, > > > > Haosdent Huang > > > > > > > > > > > > > > > -- > > Best Regards, > > Haosdent Huang > > > -- Best Regards, Haosdent Huang
