Re: Tasks may be explicitly dropped by agent in Mesos 1.5

Chun-Hung Hsiao Fri, 02 Mar 2018 09:42:02 -0800

Gilbert I think you're right. The code path doesn't exist in 1.5.0.

On Mar 2, 2018 9:36 AM, "Chun-Hung Hsiao" <chhs...@mesosphere.io> wrote:


> This is a new behavior we have after solving MESOS-1720, and thus a new
> problem only in 1.5.x. Prior to 1.5, reordered tasks (to the same executor)
> will be launched because whoever comes first will launch the executor.
> Since 1.5, one might be dropped.
>
> On Mar 1, 2018 4:36 PM, "Gilbert Song" <gilb...@mesosphere.io> wrote:
>
>> Meng,
>>
>> Could you double check if this is really an issue in Mesos 1.5.0 release?
>>
>> MESOS-1720 <https://issues.apache.org/jira/browse/MESOS-1720> was
>> resolved
>> after the 1.5 release (rc-2) and it seems like
>> it is only at the master branch and 1.5.x branch (not 1.5.0).
>>
>> Did I miss anything?
>>
>> - Gilbert
>>
>> On Thu, Mar 1, 2018 at 4:22 PM, Benjamin Mahler <bmah...@apache.org>
>> wrote:
>>
>> > Put another way, we currently don't guarantee in-order task delivery to
>> > the executor. Due to the changes for MESOS-1720, one special case of
>> task
>> > re-ordering now leads to the re-ordered task being dropped (rather than
>> > delivered out-of-order as before). Technically, this is strictly better.
>> >
>> > However, we'd like to start guaranteeing in-order task delivery.
>> >
>> > On Thu, Mar 1, 2018 at 2:56 PM, Meng Zhu <m...@mesosphere.com> wrote:
>> >
>> >> Hi all:
>> >>
>> >> TLDR: In Mesos 1.5, tasks may be explicitly dropped by the agent
>> >> if all three conditions are met:
>> >> (1) Several `LAUNCH_TASK` or `LAUNCH_GROUP` calls
>> >>  use the same executor.
>> >> (2) The executor currently does not exist on the agent.
>> >> (3) Due to some race conditions, these tasks are trying to launch
>> >> on the agent in a different order from their original launch order.
>> >>
>> >> In this case, tasks that are trying to launch on the agent
>> >> before the first task in the original order will be explicitly dropped
>> by
>> >> the agent (TASK_DROPPED` or `TASK_LOST` will be sent)).
>> >>
>> >> This bug will be fixed in 1.5.1. It is tracked in
>> >> https://issues.apache.org/jira/browse/MESOS-8624
>> >>
>> >> ----
>> >>
>> >> In https://issues.apache.org/jira/browse/MESOS-1720, we introduced an
>> >> ordering dependency between two `LAUNCH`/`LAUNCH_GROUP`
>> >> calls to a new executor. The master would specify that the first call
>> is
>> >> the
>> >> one to launch a new executor through the `launch_executor` field in
>> >> `RunTaskMessage`/`RunTaskGroupMessage`, and the second one should
>> >> use the existing executor launched by the first one.
>> >>
>> >> On the agent side, running a task/task group goes through a series of
>> >> continuations, one is `collect()` on the future that unschedule
>> >> frameworks from
>> >> being GC'ed:
>> >> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2158
>> >> another is `collect()` on task authorization:
>> >> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2333
>> >> Since these `collect()` calls run on individual actors, the futures of
>> the
>> >> `collect()` calls for two `LAUNCH`/`LAUNCH_GROUP` calls may return
>> >> out-of-order, even if the futures these two `collect()` wait for are
>> >> satisfied in
>> >> order (which is true in these two cases).
>> >>
>> >> As a result, under some race conditions (probably under some heavy load
>> >> conditions), tasks rely on the previous task to launch executor may
>> >> get processed before the task that is supposed to launch the executor
>> >> first, resulting in the tasks being explicitly dropped by the agent.
>> >>
>> >> -Meng
>> >>
>> >>
>> >>
>> >
>>
>

Re: Tasks may be explicitly dropped by agent in Mesos 1.5

Reply via email to