Benjamin Mahler created MESOS-367:
-------------------------------------

             Summary: Invalid StatusUpdateMessage from missing slave id.
                 Key: MESOS-367
                 URL: https://issues.apache.org/jira/browse/MESOS-367
             Project: Mesos
          Issue Type: Bug
            Reporter: Benjamin Mahler
            Priority: Critical


It looks like the ExecutorProcess sets its internal slaveId upon registration:

  void registered(const ExecutorInfo& executorInfo,
                  const FrameworkID& frameworkId,
                  const FrameworkInfo& frameworkInfo,
                  const SlaveID& slaveId,
                  const SlaveInfo& slaveInfo)
  {
    if (aborted) {
      VLOG(1) << "Ignoring registered message from slave " << slaveId
              << " because the driver is aborted!";
      return;
    }

    VLOG(1) << "Executor registered on slave " << slaveId;

****    this->slaveId = slaveId;   ***
    executor->registered(driver, executorInfo, frameworkInfo, slaveInfo);
  }


A result of this is that if the registration is delayed, the executor can come 
up and send a status update (before the slaveId is set), resulting in an 
incomplete protobuf:


  void sendStatusUpdate(const TaskStatus& status)
  {
    VLOG(1) << "Executor sending status update for task "
            << status.task_id() << " in state " << status.state();

    if (status.state() == TASK_STAGING) {
      VLOG(1) << "Executor is not allowed to send "
              << "TASK_STAGING status updates. Aborting!";

      driver->abort();

      executor->error(driver, "Attempted to send TASK_STAGING status update");

      return;
    }

    StatusUpdateMessage message;
    StatusUpdate* update = message.mutable_update();
    update->mutable_framework_id()->MergeFrom(frameworkId);
    update->mutable_executor_id()->MergeFrom(executorId);
****    update->mutable_slave_id()->MergeFrom(slaveId);   ****
    update->mutable_status()->MergeFrom(status);
    update->set_timestamp(Clock::now());
    update->set_uuid(UUID::random().toBytes());

    send(slave, message);
  }

The ExecutorProcess should take the slaveId in its constructor to avoid this 
issue.

Here are the relevant log lines:

I0227 23:45:56.547392 38406 slave.cpp:762] Got registration for executor 
'thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0' 
of framework 201103282247-0000000019-0000
I0227 23:45:56.547610 38411 cgroups_isolation_module.cpp:571] Changing cgroup 
controls for executor 
thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0 of 
framework 201103282247-00000000
19-0000 with resources cpus=0.35; mem=176; disk=512; ports=[31385-31385]
I0227 23:45:56.547863 38406 slave.cpp:820] Flushing queued tasks for framework 
201103282247-0000000019-0000
I0227 23:45:56.548074 38411 cgroups_isolation_module.cpp:676] Updated 
'cpu.shares' to 358 for executor 
thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0 of 
framework 201103282247-00000
00019-0000
I0227 23:45:56.548812 38411 cgroups_isolation_module.cpp:774] Updated 
'memory.limit_in_bytes' to 184549376 for executor 
thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0 of 
framework 2
01103282247-0000000019-0000
libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of 
type "mesos.internal.StatusUpdateMessage" because it is missing required 
fields: update.slave_id.value
W0227 23:45:56.663353 38408 protobuf.hpp:252] Initialization errors: 
update.slave_id.value
libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of 
type "mesos.internal.StatusUpdateMessage" because it is missing required 
fields: update.slave_id.value
W0227 23:45:56.673761 38400 protobuf.hpp:252] Initialization errors: 
update.slave_id.value

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to