we're at 1.6.0. not sure if it was rebooted, but the symptoms to look
suspiciously similar to MESOS-9501. we're due for an upgrade anyway, will
probably go that route. thanks!

On Fri, Mar 8, 2019 at 4:30 PM Gilbert Song <gilb...@apache.org> wrote:

> Hi Eric,
>
> What is your Mesos Version?
>
> Did you reboot the agent machine before task getting stuck?
> If yes, probably
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_MESOS-2D9501&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=QZ4VpVRZVz7miVYNAqeI5w&m=lZx-zyTWKmMMvu3VP1VAxi8k6bda-ZNlxsjLYt7CU6g&s=Lfb07EzsF6I9hqEDiMJ8bmc52hJcNrSr3-X1NGGCfqs&e=
>
> Did you enabled health check for that task?
> It may increase the chance of a potential FD leak:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_MESOS-2D9502&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=QZ4VpVRZVz7miVYNAqeI5w&m=lZx-zyTWKmMMvu3VP1VAxi8k6bda-ZNlxsjLYt7CU6g&s=M2wKgv42iZJmTzAblMpaONL-A0IPGB4lXaCp3ntUWis&e=
>
> -Gilbert
>
>
> On Fri, Mar 8, 2019 at 3:03 PM Eric Chung <ech...@uber.com.invalid> wrote:
>
> > Hello devs,
> >
> > We recently ran into a situation where a task's executor was killed due
> to
> > registration timeout, but neither the executor nor the task was properly
> > killed, and the task has been stuck in queued_tasks for days.
> >
> > The relevant log:
> >
> > I0305 08:43:59.069857  5215 slave.cpp:6803] Terminating executor
> > '<executor_id>' of framework <framework_id> because it did not
> > register within 15mins
> > I0305 09:16:28.266021  5200 slave.cpp:3644] Asked to kill task
> > <task_id> of framework <framework_id>
> > W0305 09:16:28.266063  5200 slave.cpp:3816] Ignoring kill task
> > <task_id> because the executor '<executor_id>' of framework
> > <framework_id> is terminating
> >
> >
> > where the following just keeps repeating:
> >
> > I0305 09:16:28.266021  5200 slave.cpp:3644] Asked to kill task
> > <task_id> of framework <framework_id>
> > W0305 09:16:28.266063  5200 slave.cpp:3816] Ignoring kill task
> > <task_id> because the executor '<executor_id>' of framework
> > <framework_id> is terminating
> >
> >
> > the agent state indicates that it doesn't have any active tasks but a
> quite
> > a few queued tasks.
> >
> > Does anyone have any insight on why this might be happening?
> >
> > Thanks,
> > Eric
> >
>

Reply via email to