Hi Eric,

What is your Mesos Version?

Did you reboot the agent machine before task getting stuck?
If yes, probably https://issues.apache.org/jira/browse/MESOS-9501

Did you enabled health check for that task?
It may increase the chance of a potential FD leak:
https://issues.apache.org/jira/browse/MESOS-9502

-Gilbert


On Fri, Mar 8, 2019 at 3:03 PM Eric Chung <ech...@uber.com.invalid> wrote:

> Hello devs,
>
> We recently ran into a situation where a task's executor was killed due to
> registration timeout, but neither the executor nor the task was properly
> killed, and the task has been stuck in queued_tasks for days.
>
> The relevant log:
>
> I0305 08:43:59.069857  5215 slave.cpp:6803] Terminating executor
> '<executor_id>' of framework <framework_id> because it did not
> register within 15mins
> I0305 09:16:28.266021  5200 slave.cpp:3644] Asked to kill task
> <task_id> of framework <framework_id>
> W0305 09:16:28.266063  5200 slave.cpp:3816] Ignoring kill task
> <task_id> because the executor '<executor_id>' of framework
> <framework_id> is terminating
>
>
> where the following just keeps repeating:
>
> I0305 09:16:28.266021  5200 slave.cpp:3644] Asked to kill task
> <task_id> of framework <framework_id>
> W0305 09:16:28.266063  5200 slave.cpp:3816] Ignoring kill task
> <task_id> because the executor '<executor_id>' of framework
> <framework_id> is terminating
>
>
> the agent state indicates that it doesn't have any active tasks but a quite
> a few queued tasks.
>
> Does anyone have any insight on why this might be happening?
>
> Thanks,
> Eric
>

Reply via email to