Re: Collecting futures in the same actor in libprocess

2018-03-01 Thread Benjamin Mahler
Chatted offline with Chun and Meng and suggested we take an explicit
approach of using process::Sequence to ensure ordered task delivery (this
would need to be done both in the master and agent).

On Thu, Mar 1, 2018 at 1:17 PM, Chun-Hung Hsiao 
wrote:

> Some background for the bug AlexR and Meng found:
>
> In https://issues.apache.org/jira/browse/MESOS-1720,
> we introduce an ordering dependency between two `LAUNCH`/`LAUNCH_GROUP`
> calls to a new executor.
> The master would specify that the first call is the one to launch a new
> executor
> through the `launch_executor` field in
> `RunTaskMessage`/`RunTaskGroupMessage`,
> and the second one should use the existing executor launched by the first
> call.
> At the agent side, it will drop any task that want to launch an executor
> which is already existing,
> or any task that want to run on a non-existent executor.
>
> Running a task/task group goes through a series of continuations,
> one is `collect()` on the future that unschedule frameworks from being
> GC'ed:
> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2158
> another is `collect()` on task authorization:
> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2333
> Since these `collect()` calls run on individual actors, the futures of the
> `collect()` calls for
> two `LAUNCH`/`LAUNCH_GROUP` calls may returns out-of-order,
> even if the futures these two `collect()` wait for are satisfied in order
> (which is true).
>
> The result is that, if this race condition is triggered,
> the agent will try to run the second task/task group before the first one,
> and since the executor is supposed to be launched by the first one,
> the agent will end up sending `TASK_DROPPED` for the second call.
>
> If we can have an interface to make sure that `collect()` returns in the
> same order
> of their dependent futures, this can be avoided.
>
> On Mar 1, 2018 12:50 PM, "Benjamin Mahler"  wrote:
>
> > Could you explain the problem in more detail?
> >
> > On Thu, Mar 1, 2018 at 12:15 PM Chun-Hung Hsiao 
> > wrote:
> >
> > > Hi all,
> > >
> > > Meng found a bug in `slave.cpp`, where the proper fix requires
> collecting
> > > futures in order. Currently every `collect` call spawns it's own actor,
> > so
> > > for two `collect` calls, even though their futures are satisfied in
> > order,
> > > they may finish out-of-order. So we need some libprocess changes to
> have
> > > the ability to collect futures in the same actor. Here I have two
> > > proposals:
> > >
> > > 1. Add a new `collect` interface that takes an actor as a parameter.
> > >
> > > 2. Introduce `process::Executor::collect()` for this.
> > >
> > > Any opinion on these two options?
> > >
> >
>


Re: Collecting futures in the same actor in libprocess

2018-03-01 Thread Chun-Hung Hsiao
Some background for the bug AlexR and Meng found:

In https://issues.apache.org/jira/browse/MESOS-1720,
we introduce an ordering dependency between two `LAUNCH`/`LAUNCH_GROUP`
calls to a new executor.
The master would specify that the first call is the one to launch a new
executor
through the `launch_executor` field in
`RunTaskMessage`/`RunTaskGroupMessage`,
and the second one should use the existing executor launched by the first
call.
At the agent side, it will drop any task that want to launch an executor
which is already existing,
or any task that want to run on a non-existent executor.

Running a task/task group goes through a series of continuations,
one is `collect()` on the future that unschedule frameworks from being
GC'ed:
https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2158
another is `collect()` on task authorization:
https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2333
Since these `collect()` calls run on individual actors, the futures of the
`collect()` calls for
two `LAUNCH`/`LAUNCH_GROUP` calls may returns out-of-order,
even if the futures these two `collect()` wait for are satisfied in order
(which is true).

The result is that, if this race condition is triggered,
the agent will try to run the second task/task group before the first one,
and since the executor is supposed to be launched by the first one,
the agent will end up sending `TASK_DROPPED` for the second call.

If we can have an interface to make sure that `collect()` returns in the
same order
of their dependent futures, this can be avoided.

On Mar 1, 2018 12:50 PM, "Benjamin Mahler"  wrote:

> Could you explain the problem in more detail?
>
> On Thu, Mar 1, 2018 at 12:15 PM Chun-Hung Hsiao 
> wrote:
>
> > Hi all,
> >
> > Meng found a bug in `slave.cpp`, where the proper fix requires collecting
> > futures in order. Currently every `collect` call spawns it's own actor,
> so
> > for two `collect` calls, even though their futures are satisfied in
> order,
> > they may finish out-of-order. So we need some libprocess changes to have
> > the ability to collect futures in the same actor. Here I have two
> > proposals:
> >
> > 1. Add a new `collect` interface that takes an actor as a parameter.
> >
> > 2. Introduce `process::Executor::collect()` for this.
> >
> > Any opinion on these two options?
> >
>


Re: Collecting futures in the same actor in libprocess

2018-03-01 Thread Benjamin Mahler
Could you explain the problem in more detail?

On Thu, Mar 1, 2018 at 12:15 PM Chun-Hung Hsiao 
wrote:

> Hi all,
>
> Meng found a bug in `slave.cpp`, where the proper fix requires collecting
> futures in order. Currently every `collect` call spawns it's own actor, so
> for two `collect` calls, even though their futures are satisfied in order,
> they may finish out-of-order. So we need some libprocess changes to have
> the ability to collect futures in the same actor. Here I have two
> proposals:
>
> 1. Add a new `collect` interface that takes an actor as a parameter.
>
> 2. Introduce `process::Executor::collect()` for this.
>
> Any opinion on these two options?
>


Collecting futures in the same actor in libprocess

2018-03-01 Thread Chun-Hung Hsiao
Hi all,

Meng found a bug in `slave.cpp`, where the proper fix requires collecting
futures in order. Currently every `collect` call spawns it's own actor, so
for two `collect` calls, even though their futures are satisfied in order,
they may finish out-of-order. So we need some libprocess changes to have
the ability to collect futures in the same actor. Here I have two proposals:

1. Add a new `collect` interface that takes an actor as a parameter.

2. Introduce `process::Executor::collect()` for this.

Any opinion on these two options?