Re: Collecting futures in the same actor in libprocess
Chatted offline with Chun and Meng and suggested we take an explicit approach of using process::Sequence to ensure ordered task delivery (this would need to be done both in the master and agent). On Thu, Mar 1, 2018 at 1:17 PM, Chun-Hung Hsiao wrote: > Some background for the bug AlexR and Meng found: > > In https://issues.apache.org/jira/browse/MESOS-1720, > we introduce an ordering dependency between two `LAUNCH`/`LAUNCH_GROUP` > calls to a new executor. > The master would specify that the first call is the one to launch a new > executor > through the `launch_executor` field in > `RunTaskMessage`/`RunTaskGroupMessage`, > and the second one should use the existing executor launched by the first > call. > At the agent side, it will drop any task that want to launch an executor > which is already existing, > or any task that want to run on a non-existent executor. > > Running a task/task group goes through a series of continuations, > one is `collect()` on the future that unschedule frameworks from being > GC'ed: > https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2158 > another is `collect()` on task authorization: > https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2333 > Since these `collect()` calls run on individual actors, the futures of the > `collect()` calls for > two `LAUNCH`/`LAUNCH_GROUP` calls may returns out-of-order, > even if the futures these two `collect()` wait for are satisfied in order > (which is true). > > The result is that, if this race condition is triggered, > the agent will try to run the second task/task group before the first one, > and since the executor is supposed to be launched by the first one, > the agent will end up sending `TASK_DROPPED` for the second call. > > If we can have an interface to make sure that `collect()` returns in the > same order > of their dependent futures, this can be avoided. > > On Mar 1, 2018 12:50 PM, "Benjamin Mahler" wrote: > > > Could you explain the problem in more detail? > > > > On Thu, Mar 1, 2018 at 12:15 PM Chun-Hung Hsiao > > wrote: > > > > > Hi all, > > > > > > Meng found a bug in `slave.cpp`, where the proper fix requires > collecting > > > futures in order. Currently every `collect` call spawns it's own actor, > > so > > > for two `collect` calls, even though their futures are satisfied in > > order, > > > they may finish out-of-order. So we need some libprocess changes to > have > > > the ability to collect futures in the same actor. Here I have two > > > proposals: > > > > > > 1. Add a new `collect` interface that takes an actor as a parameter. > > > > > > 2. Introduce `process::Executor::collect()` for this. > > > > > > Any opinion on these two options? > > > > > >
Re: Collecting futures in the same actor in libprocess
Some background for the bug AlexR and Meng found: In https://issues.apache.org/jira/browse/MESOS-1720, we introduce an ordering dependency between two `LAUNCH`/`LAUNCH_GROUP` calls to a new executor. The master would specify that the first call is the one to launch a new executor through the `launch_executor` field in `RunTaskMessage`/`RunTaskGroupMessage`, and the second one should use the existing executor launched by the first call. At the agent side, it will drop any task that want to launch an executor which is already existing, or any task that want to run on a non-existent executor. Running a task/task group goes through a series of continuations, one is `collect()` on the future that unschedule frameworks from being GC'ed: https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2158 another is `collect()` on task authorization: https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2333 Since these `collect()` calls run on individual actors, the futures of the `collect()` calls for two `LAUNCH`/`LAUNCH_GROUP` calls may returns out-of-order, even if the futures these two `collect()` wait for are satisfied in order (which is true). The result is that, if this race condition is triggered, the agent will try to run the second task/task group before the first one, and since the executor is supposed to be launched by the first one, the agent will end up sending `TASK_DROPPED` for the second call. If we can have an interface to make sure that `collect()` returns in the same order of their dependent futures, this can be avoided. On Mar 1, 2018 12:50 PM, "Benjamin Mahler" wrote: > Could you explain the problem in more detail? > > On Thu, Mar 1, 2018 at 12:15 PM Chun-Hung Hsiao > wrote: > > > Hi all, > > > > Meng found a bug in `slave.cpp`, where the proper fix requires collecting > > futures in order. Currently every `collect` call spawns it's own actor, > so > > for two `collect` calls, even though their futures are satisfied in > order, > > they may finish out-of-order. So we need some libprocess changes to have > > the ability to collect futures in the same actor. Here I have two > > proposals: > > > > 1. Add a new `collect` interface that takes an actor as a parameter. > > > > 2. Introduce `process::Executor::collect()` for this. > > > > Any opinion on these two options? > > >
Re: Collecting futures in the same actor in libprocess
Could you explain the problem in more detail? On Thu, Mar 1, 2018 at 12:15 PM Chun-Hung Hsiao wrote: > Hi all, > > Meng found a bug in `slave.cpp`, where the proper fix requires collecting > futures in order. Currently every `collect` call spawns it's own actor, so > for two `collect` calls, even though their futures are satisfied in order, > they may finish out-of-order. So we need some libprocess changes to have > the ability to collect futures in the same actor. Here I have two > proposals: > > 1. Add a new `collect` interface that takes an actor as a parameter. > > 2. Introduce `process::Executor::collect()` for this. > > Any opinion on these two options? >
Collecting futures in the same actor in libprocess
Hi all, Meng found a bug in `slave.cpp`, where the proper fix requires collecting futures in order. Currently every `collect` call spawns it's own actor, so for two `collect` calls, even though their futures are satisfied in order, they may finish out-of-order. So we need some libprocess changes to have the ability to collect futures in the same actor. Here I have two proposals: 1. Add a new `collect` interface that takes an actor as a parameter. 2. Introduce `process::Executor::collect()` for this. Any opinion on these two options?