[jira] [Commented] (MESOS-8317) Check failed when newly registered executor has launched tasks.
[ https://issues.apache.org/jira/browse/MESOS-8317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284945#comment-16284945 ] James Peach commented on MESOS-8317: The executor failed because it had older protobufs than the scheduler. It was using the JSON content type and the Go jsonpb package pukes if it receives a field that it doesn't know about. The field in question was the {{protocol}} field in the {{HealthCheck}} message. > Check failed when newly registered executor has launched tasks. > --- > > Key: MESOS-8317 > URL: https://issues.apache.org/jira/browse/MESOS-8317 > Project: Mesos > Issue Type: Bug >Reporter: James Peach > > This check in {{slave/slave.cpp}} can fail: > {code} >4105 if (state != RECOVERING && >4106 executor->queuedTasks.empty() && >4107 executor->queuedTaskGroups.empty()) { >4108 CHECK(executor->launchedTasks.empty()) >4109 << " Newly registered executor '" << executor->id >4110 << "' has launched tasks"; >4111 >4112 LOG(WARNING) << "Shutting down the executor " << *executor >4113 << " because it has no tasks to run"; >4114 >4115 _shutdownExecutor(framework, executor); >4116 >4117 return; >4118 } > {code} > This happens with the following sequence of events: > 1. HTTP executor subscribes > 2. Agent sends a LAUNCH message that the executor can't decode > 3. HTTP executor closes the channel and re-subscribes > 4. Agent hits the above check because the executor sends and empty task list > (it never understood the LAUNCH message), but the agent thinks that a task > should have been launched. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8317) Check failed when newly registered executor has launched tasks.
[ https://issues.apache.org/jira/browse/MESOS-8317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284396#comment-16284396 ] Vinod Kone commented on MESOS-8317: --- Was there some incompatibility that was accidentally introduced in the `LAUNCH` message for it to be not decoded by an executor? Regardless, it's better for the agent to handle this gracefully than a hard CHECK failure. > Check failed when newly registered executor has launched tasks. > --- > > Key: MESOS-8317 > URL: https://issues.apache.org/jira/browse/MESOS-8317 > Project: Mesos > Issue Type: Bug >Reporter: James Peach > > This check in {{slave/slave.cpp}} can fail: > {code} >4105 if (state != RECOVERING && >4106 executor->queuedTasks.empty() && >4107 executor->queuedTaskGroups.empty()) { >4108 CHECK(executor->launchedTasks.empty()) >4109 << " Newly registered executor '" << executor->id >4110 << "' has launched tasks"; >4111 >4112 LOG(WARNING) << "Shutting down the executor " << *executor >4113 << " because it has no tasks to run"; >4114 >4115 _shutdownExecutor(framework, executor); >4116 >4117 return; >4118 } > {code} > This happens with the following sequence of events: > 1. HTTP executor subscribes > 2. Agent sends a LAUNCH message that the executor can't decode > 3. HTTP executor closes the channel and re-subscribes > 4. Agent hits the above check because the executor sends and empty task list > (it never understood the LAUNCH message), but the agent thinks that a task > should have been launched. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8317) Check failed when newly registered executor has launched tasks.
[ https://issues.apache.org/jira/browse/MESOS-8317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284378#comment-16284378 ] James Peach commented on MESOS-8317: /cc [~vinodkone] > Check failed when newly registered executor has launched tasks. > --- > > Key: MESOS-8317 > URL: https://issues.apache.org/jira/browse/MESOS-8317 > Project: Mesos > Issue Type: Bug >Reporter: James Peach > > This check in {{slave/slave.cpp}} can fail: > {code} >4105 if (state != RECOVERING && >4106 executor->queuedTasks.empty() && >4107 executor->queuedTaskGroups.empty()) { >4108 CHECK(executor->launchedTasks.empty()) >4109 << " Newly registered executor '" << executor->id >4110 << "' has launched tasks"; >4111 >4112 LOG(WARNING) << "Shutting down the executor " << *executor >4113 << " because it has no tasks to run"; >4114 >4115 _shutdownExecutor(framework, executor); >4116 >4117 return; >4118 } > {code} > This happens with the following sequence of events: > 1. HTTP executor subscribes > 2. Agent sends a LAUNCH message that the executor can't decode > 3. HTTP executor closes the channel and re-subscribes > 4. Agent hits the above check because the executor sends and empty task list > (it never understood the LAUNCH message), but the agent thinks that a task > should have been launched. -- This message was sent by Atlassian JIRA (v6.4.14#64029)