CORRECTION:

This is a new behavior that only appears in the current 1.5.x branch. In
1.5.0, Mesos
agent still has the old behavior, namely, any reordered tasks (to the same
executor)
are launched regardless.

On Fri, Mar 2, 2018 at 9:41 AM, Chun-Hung Hsiao <chhs...@mesosphere.io>
wrote:

> Gilbert I think you're right. The code path doesn't exist in 1.5.0.
>
> On Mar 2, 2018 9:36 AM, "Chun-Hung Hsiao" <chhs...@mesosphere.io> wrote:
>
> > This is a new behavior we have after solving MESOS-1720, and thus a new
> > problem only in 1.5.x. Prior to 1.5, reordered tasks (to the same
> executor)
> > will be launched because whoever comes first will launch the executor.
> > Since 1.5, one might be dropped.
> >
> > On Mar 1, 2018 4:36 PM, "Gilbert Song" <gilb...@mesosphere.io> wrote:
> >
> >> Meng,
> >>
> >> Could you double check if this is really an issue in Mesos 1.5.0
> release?
> >>
> >> MESOS-1720 <https://issues.apache.org/jira/browse/MESOS-1720> was
> >> resolved
> >> after the 1.5 release (rc-2) and it seems like
> >> it is only at the master branch and 1.5.x branch (not 1.5.0).
> >>
> >> Did I miss anything?
> >>
> >> - Gilbert
> >>
> >> On Thu, Mar 1, 2018 at 4:22 PM, Benjamin Mahler <bmah...@apache.org>
> >> wrote:
> >>
> >> > Put another way, we currently don't guarantee in-order task delivery
> to
> >> > the executor. Due to the changes for MESOS-1720, one special case of
> >> task
> >> > re-ordering now leads to the re-ordered task being dropped (rather
> than
> >> > delivered out-of-order as before). Technically, this is strictly
> better.
> >> >
> >> > However, we'd like to start guaranteeing in-order task delivery.
> >> >
> >> > On Thu, Mar 1, 2018 at 2:56 PM, Meng Zhu <m...@mesosphere.com> wrote:
> >> >
> >> >> Hi all:
> >> >>
> >> >> TLDR: In Mesos 1.5, tasks may be explicitly dropped by the agent
> >> >> if all three conditions are met:
> >> >> (1) Several `LAUNCH_TASK` or `LAUNCH_GROUP` calls
> >> >>  use the same executor.
> >> >> (2) The executor currently does not exist on the agent.
> >> >> (3) Due to some race conditions, these tasks are trying to launch
> >> >> on the agent in a different order from their original launch order.
> >> >>
> >> >> In this case, tasks that are trying to launch on the agent
> >> >> before the first task in the original order will be explicitly
> dropped
> >> by
> >> >> the agent (TASK_DROPPED` or `TASK_LOST` will be sent)).
> >> >>
> >> >> This bug will be fixed in 1.5.1. It is tracked in
> >> >> https://issues.apache.org/jira/browse/MESOS-8624
> >> >>
> >> >> ----
> >> >>
> >> >> In https://issues.apache.org/jira/browse/MESOS-1720, we introduced
> an
> >> >> ordering dependency between two `LAUNCH`/`LAUNCH_GROUP`
> >> >> calls to a new executor. The master would specify that the first call
> >> is
> >> >> the
> >> >> one to launch a new executor through the `launch_executor` field in
> >> >> `RunTaskMessage`/`RunTaskGroupMessage`, and the second one should
> >> >> use the existing executor launched by the first one.
> >> >>
> >> >> On the agent side, running a task/task group goes through a series of
> >> >> continuations, one is `collect()` on the future that unschedule
> >> >> frameworks from
> >> >> being GC'ed:
> >> >> https://github.com/apache/mesos/blob/master/src/slave/slave.
> cpp#L2158
> >> >> another is `collect()` on task authorization:
> >> >> https://github.com/apache/mesos/blob/master/src/slave/slave.
> cpp#L2333
> >> >> Since these `collect()` calls run on individual actors, the futures
> of
> >> the
> >> >> `collect()` calls for two `LAUNCH`/`LAUNCH_GROUP` calls may return
> >> >> out-of-order, even if the futures these two `collect()` wait for are
> >> >> satisfied in
> >> >> order (which is true in these two cases).
> >> >>
> >> >> As a result, under some race conditions (probably under some heavy
> load
> >> >> conditions), tasks rely on the previous task to launch executor may
> >> >> get processed before the task that is supposed to launch the executor
> >> >> first, resulting in the tasks being explicitly dropped by the agent.
> >> >>
> >> >> -Meng
> >> >>
> >> >>
> >> >>
> >> >
> >>
> >
>

Reply via email to