Benjamin Mahler created MESOS-367:
-------------------------------------
Summary: Invalid StatusUpdateMessage from missing slave id.
Key: MESOS-367
URL: https://issues.apache.org/jira/browse/MESOS-367
Project: Mesos
Issue Type: Bug
Reporter: Benjamin Mahler
Priority: Critical
It looks like the ExecutorProcess sets its internal slaveId upon registration:
void registered(const ExecutorInfo& executorInfo,
const FrameworkID& frameworkId,
const FrameworkInfo& frameworkInfo,
const SlaveID& slaveId,
const SlaveInfo& slaveInfo)
{
if (aborted) {
VLOG(1) << "Ignoring registered message from slave " << slaveId
<< " because the driver is aborted!";
return;
}
VLOG(1) << "Executor registered on slave " << slaveId;
**** this->slaveId = slaveId; ***
executor->registered(driver, executorInfo, frameworkInfo, slaveInfo);
}
A result of this is that if the registration is delayed, the executor can come
up and send a status update (before the slaveId is set), resulting in an
incomplete protobuf:
void sendStatusUpdate(const TaskStatus& status)
{
VLOG(1) << "Executor sending status update for task "
<< status.task_id() << " in state " << status.state();
if (status.state() == TASK_STAGING) {
VLOG(1) << "Executor is not allowed to send "
<< "TASK_STAGING status updates. Aborting!";
driver->abort();
executor->error(driver, "Attempted to send TASK_STAGING status update");
return;
}
StatusUpdateMessage message;
StatusUpdate* update = message.mutable_update();
update->mutable_framework_id()->MergeFrom(frameworkId);
update->mutable_executor_id()->MergeFrom(executorId);
**** update->mutable_slave_id()->MergeFrom(slaveId); ****
update->mutable_status()->MergeFrom(status);
update->set_timestamp(Clock::now());
update->set_uuid(UUID::random().toBytes());
send(slave, message);
}
The ExecutorProcess should take the slaveId in its constructor to avoid this
issue.
Here are the relevant log lines:
I0227 23:45:56.547392 38406 slave.cpp:762] Got registration for executor
'thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0'
of framework 201103282247-0000000019-0000
I0227 23:45:56.547610 38411 cgroups_isolation_module.cpp:571] Changing cgroup
controls for executor
thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0 of
framework 201103282247-00000000
19-0000 with resources cpus=0.35; mem=176; disk=512; ports=[31385-31385]
I0227 23:45:56.547863 38406 slave.cpp:820] Flushing queued tasks for framework
201103282247-0000000019-0000
I0227 23:45:56.548074 38411 cgroups_isolation_module.cpp:676] Updated
'cpu.shares' to 358 for executor
thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0 of
framework 201103282247-00000
00019-0000
I0227 23:45:56.548812 38411 cgroups_isolation_module.cpp:774] Updated
'memory.limit_in_bytes' to 184549376 for executor
thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0 of
framework 2
01103282247-0000000019-0000
libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of
type "mesos.internal.StatusUpdateMessage" because it is missing required
fields: update.slave_id.value
W0227 23:45:56.663353 38408 protobuf.hpp:252] Initialization errors:
update.slave_id.value
libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of
type "mesos.internal.StatusUpdateMessage" because it is missing required
fields: update.slave_id.value
W0227 23:45:56.673761 38400 protobuf.hpp:252] Initialization errors:
update.slave_id.value
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira