Put another way, we currently don't guarantee in-order task delivery to the executor. Due to the changes for MESOS-1720, one special case of task re-ordering now leads to the re-ordered task being dropped (rather than delivered out-of-order as before). Technically, this is strictly better.
However, we'd like to start guaranteeing in-order task delivery. On Thu, Mar 1, 2018 at 2:56 PM, Meng Zhu <m...@mesosphere.com> wrote: > Hi all: > > TLDR: In Mesos 1.5, tasks may be explicitly dropped by the agent > if all three conditions are met: > (1) Several `LAUNCH_TASK` or `LAUNCH_GROUP` calls > use the same executor. > (2) The executor currently does not exist on the agent. > (3) Due to some race conditions, these tasks are trying to launch > on the agent in a different order from their original launch order. > > In this case, tasks that are trying to launch on the agent > before the first task in the original order will be explicitly dropped by > the agent (TASK_DROPPED` or `TASK_LOST` will be sent)). > > This bug will be fixed in 1.5.1. It is tracked in > https://issues.apache.org/jira/browse/MESOS-8624 > > ---- > > In https://issues.apache.org/jira/browse/MESOS-1720, we introduced an > ordering dependency between two `LAUNCH`/`LAUNCH_GROUP` > calls to a new executor. The master would specify that the first call is > the > one to launch a new executor through the `launch_executor` field in > `RunTaskMessage`/`RunTaskGroupMessage`, and the second one should > use the existing executor launched by the first one. > > On the agent side, running a task/task group goes through a series of > continuations, one is `collect()` on the future that unschedule > frameworks from > being GC'ed: > https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2158 > another is `collect()` on task authorization: > https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2333 > Since these `collect()` calls run on individual actors, the futures of the > `collect()` calls for two `LAUNCH`/`LAUNCH_GROUP` calls may return > out-of-order, even if the futures these two `collect()` wait for are > satisfied in > order (which is true in these two cases). > > As a result, under some race conditions (probably under some heavy load > conditions), tasks rely on the previous task to launch executor may > get processed before the task that is supposed to launch the executor > first, resulting in the tasks being explicitly dropped by the agent. > > -Meng > > >