[GitHub] mesos pull request #268: Don't mention the deceased mesos-health-check binar...

2018-03-01 Thread benjaminp
GitHub user benjaminp opened a pull request:

https://github.com/apache/mesos/pull/268

Don't mention the deceased mesos-health-check binary in docs for 
--launcher_dir.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/benjaminp/mesos no-health-check

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mesos/pull/268.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #268


commit 76072d0c1ea6423bce0a790369c604088b9b18b6
Author: Benjamin Peterson 
Date:   2018-03-02T04:16:29Z

Don't mention the deceased mesos-health-check binary in docs for 
--launcher_dir.




---


Re: Tasks may be explicitly dropped by agent in Mesos 1.5

2018-03-01 Thread Gilbert Song
Meng,

Could you double check if this is really an issue in Mesos 1.5.0 release?

MESOS-1720  was resolved
after the 1.5 release (rc-2) and it seems like
it is only at the master branch and 1.5.x branch (not 1.5.0).

Did I miss anything?

- Gilbert

On Thu, Mar 1, 2018 at 4:22 PM, Benjamin Mahler  wrote:

> Put another way, we currently don't guarantee in-order task delivery to
> the executor. Due to the changes for MESOS-1720, one special case of task
> re-ordering now leads to the re-ordered task being dropped (rather than
> delivered out-of-order as before). Technically, this is strictly better.
>
> However, we'd like to start guaranteeing in-order task delivery.
>
> On Thu, Mar 1, 2018 at 2:56 PM, Meng Zhu  wrote:
>
>> Hi all:
>>
>> TLDR: In Mesos 1.5, tasks may be explicitly dropped by the agent
>> if all three conditions are met:
>> (1) Several `LAUNCH_TASK` or `LAUNCH_GROUP` calls
>>  use the same executor.
>> (2) The executor currently does not exist on the agent.
>> (3) Due to some race conditions, these tasks are trying to launch
>> on the agent in a different order from their original launch order.
>>
>> In this case, tasks that are trying to launch on the agent
>> before the first task in the original order will be explicitly dropped by
>> the agent (TASK_DROPPED` or `TASK_LOST` will be sent)).
>>
>> This bug will be fixed in 1.5.1. It is tracked in
>> https://issues.apache.org/jira/browse/MESOS-8624
>>
>> 
>>
>> In https://issues.apache.org/jira/browse/MESOS-1720, we introduced an
>> ordering dependency between two `LAUNCH`/`LAUNCH_GROUP`
>> calls to a new executor. The master would specify that the first call is
>> the
>> one to launch a new executor through the `launch_executor` field in
>> `RunTaskMessage`/`RunTaskGroupMessage`, and the second one should
>> use the existing executor launched by the first one.
>>
>> On the agent side, running a task/task group goes through a series of
>> continuations, one is `collect()` on the future that unschedule
>> frameworks from
>> being GC'ed:
>> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2158
>> another is `collect()` on task authorization:
>> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2333
>> Since these `collect()` calls run on individual actors, the futures of the
>> `collect()` calls for two `LAUNCH`/`LAUNCH_GROUP` calls may return
>> out-of-order, even if the futures these two `collect()` wait for are
>> satisfied in
>> order (which is true in these two cases).
>>
>> As a result, under some race conditions (probably under some heavy load
>> conditions), tasks rely on the previous task to launch executor may
>> get processed before the task that is supposed to launch the executor
>> first, resulting in the tasks being explicitly dropped by the agent.
>>
>> -Meng
>>
>>
>>
>


Re: Tasks may be explicitly dropped by agent in Mesos 1.5

2018-03-01 Thread Benjamin Mahler
Put another way, we currently don't guarantee in-order task delivery to the
executor. Due to the changes for MESOS-1720, one special case of task
re-ordering now leads to the re-ordered task being dropped (rather than
delivered out-of-order as before). Technically, this is strictly better.

However, we'd like to start guaranteeing in-order task delivery.

On Thu, Mar 1, 2018 at 2:56 PM, Meng Zhu  wrote:

> Hi all:
>
> TLDR: In Mesos 1.5, tasks may be explicitly dropped by the agent
> if all three conditions are met:
> (1) Several `LAUNCH_TASK` or `LAUNCH_GROUP` calls
>  use the same executor.
> (2) The executor currently does not exist on the agent.
> (3) Due to some race conditions, these tasks are trying to launch
> on the agent in a different order from their original launch order.
>
> In this case, tasks that are trying to launch on the agent
> before the first task in the original order will be explicitly dropped by
> the agent (TASK_DROPPED` or `TASK_LOST` will be sent)).
>
> This bug will be fixed in 1.5.1. It is tracked in
> https://issues.apache.org/jira/browse/MESOS-8624
>
> 
>
> In https://issues.apache.org/jira/browse/MESOS-1720, we introduced an
> ordering dependency between two `LAUNCH`/`LAUNCH_GROUP`
> calls to a new executor. The master would specify that the first call is
> the
> one to launch a new executor through the `launch_executor` field in
> `RunTaskMessage`/`RunTaskGroupMessage`, and the second one should
> use the existing executor launched by the first one.
>
> On the agent side, running a task/task group goes through a series of
> continuations, one is `collect()` on the future that unschedule
> frameworks from
> being GC'ed:
> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2158
> another is `collect()` on task authorization:
> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2333
> Since these `collect()` calls run on individual actors, the futures of the
> `collect()` calls for two `LAUNCH`/`LAUNCH_GROUP` calls may return
> out-of-order, even if the futures these two `collect()` wait for are
> satisfied in
> order (which is true in these two cases).
>
> As a result, under some race conditions (probably under some heavy load
> conditions), tasks rely on the previous task to launch executor may
> get processed before the task that is supposed to launch the executor
> first, resulting in the tasks being explicitly dropped by the agent.
>
> -Meng
>
>
>


Tasks may be explicitly dropped by agent in Mesos 1.5

2018-03-01 Thread Meng Zhu
Hi all:

TLDR: In Mesos 1.5, tasks may be explicitly dropped by the agent
if all three conditions are met:
(1) Several `LAUNCH_TASK` or `LAUNCH_GROUP` calls
 use the same executor.
(2) The executor currently does not exist on the agent.
(3) Due to some race conditions, these tasks are trying to launch
on the agent in a different order from their original launch order.

In this case, tasks that are trying to launch on the agent
before the first task in the original order will be explicitly dropped by
the agent (TASK_DROPPED` or `TASK_LOST` will be sent)).

This bug will be fixed in 1.5.1. It is tracked in
https://issues.apache.org/jira/browse/MESOS-8624



In https://issues.apache.org/jira/browse/MESOS-1720, we introduced an
ordering dependency between two `LAUNCH`/`LAUNCH_GROUP`
calls to a new executor. The master would specify that the first call is the
one to launch a new executor through the `launch_executor` field in
`RunTaskMessage`/`RunTaskGroupMessage`, and the second one should
use the existing executor launched by the first one.

On the agent side, running a task/task group goes through a series of
continuations, one is `collect()` on the future that unschedule frameworks
from
being GC'ed:
https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2158
another is `collect()` on task authorization:
https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2333
Since these `collect()` calls run on individual actors, the futures of the
`collect()` calls for two `LAUNCH`/`LAUNCH_GROUP` calls may return
out-of-order, even if the futures these two `collect()` wait for are
satisfied in
order (which is true in these two cases).

As a result, under some race conditions (probably under some heavy load
conditions), tasks rely on the previous task to launch executor may
get processed before the task that is supposed to launch the executor
first, resulting in the tasks being explicitly dropped by the agent.

-Meng


Re: Collecting futures in the same actor in libprocess

2018-03-01 Thread Benjamin Mahler
Chatted offline with Chun and Meng and suggested we take an explicit
approach of using process::Sequence to ensure ordered task delivery (this
would need to be done both in the master and agent).

On Thu, Mar 1, 2018 at 1:17 PM, Chun-Hung Hsiao 
wrote:

> Some background for the bug AlexR and Meng found:
>
> In https://issues.apache.org/jira/browse/MESOS-1720,
> we introduce an ordering dependency between two `LAUNCH`/`LAUNCH_GROUP`
> calls to a new executor.
> The master would specify that the first call is the one to launch a new
> executor
> through the `launch_executor` field in
> `RunTaskMessage`/`RunTaskGroupMessage`,
> and the second one should use the existing executor launched by the first
> call.
> At the agent side, it will drop any task that want to launch an executor
> which is already existing,
> or any task that want to run on a non-existent executor.
>
> Running a task/task group goes through a series of continuations,
> one is `collect()` on the future that unschedule frameworks from being
> GC'ed:
> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2158
> another is `collect()` on task authorization:
> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2333
> Since these `collect()` calls run on individual actors, the futures of the
> `collect()` calls for
> two `LAUNCH`/`LAUNCH_GROUP` calls may returns out-of-order,
> even if the futures these two `collect()` wait for are satisfied in order
> (which is true).
>
> The result is that, if this race condition is triggered,
> the agent will try to run the second task/task group before the first one,
> and since the executor is supposed to be launched by the first one,
> the agent will end up sending `TASK_DROPPED` for the second call.
>
> If we can have an interface to make sure that `collect()` returns in the
> same order
> of their dependent futures, this can be avoided.
>
> On Mar 1, 2018 12:50 PM, "Benjamin Mahler"  wrote:
>
> > Could you explain the problem in more detail?
> >
> > On Thu, Mar 1, 2018 at 12:15 PM Chun-Hung Hsiao 
> > wrote:
> >
> > > Hi all,
> > >
> > > Meng found a bug in `slave.cpp`, where the proper fix requires
> collecting
> > > futures in order. Currently every `collect` call spawns it's own actor,
> > so
> > > for two `collect` calls, even though their futures are satisfied in
> > order,
> > > they may finish out-of-order. So we need some libprocess changes to
> have
> > > the ability to collect futures in the same actor. Here I have two
> > > proposals:
> > >
> > > 1. Add a new `collect` interface that takes an actor as a parameter.
> > >
> > > 2. Introduce `process::Executor::collect()` for this.
> > >
> > > Any opinion on these two options?
> > >
> >
>


Re: Collecting futures in the same actor in libprocess

2018-03-01 Thread Chun-Hung Hsiao
Some background for the bug AlexR and Meng found:

In https://issues.apache.org/jira/browse/MESOS-1720,
we introduce an ordering dependency between two `LAUNCH`/`LAUNCH_GROUP`
calls to a new executor.
The master would specify that the first call is the one to launch a new
executor
through the `launch_executor` field in
`RunTaskMessage`/`RunTaskGroupMessage`,
and the second one should use the existing executor launched by the first
call.
At the agent side, it will drop any task that want to launch an executor
which is already existing,
or any task that want to run on a non-existent executor.

Running a task/task group goes through a series of continuations,
one is `collect()` on the future that unschedule frameworks from being
GC'ed:
https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2158
another is `collect()` on task authorization:
https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2333
Since these `collect()` calls run on individual actors, the futures of the
`collect()` calls for
two `LAUNCH`/`LAUNCH_GROUP` calls may returns out-of-order,
even if the futures these two `collect()` wait for are satisfied in order
(which is true).

The result is that, if this race condition is triggered,
the agent will try to run the second task/task group before the first one,
and since the executor is supposed to be launched by the first one,
the agent will end up sending `TASK_DROPPED` for the second call.

If we can have an interface to make sure that `collect()` returns in the
same order
of their dependent futures, this can be avoided.

On Mar 1, 2018 12:50 PM, "Benjamin Mahler"  wrote:

> Could you explain the problem in more detail?
>
> On Thu, Mar 1, 2018 at 12:15 PM Chun-Hung Hsiao 
> wrote:
>
> > Hi all,
> >
> > Meng found a bug in `slave.cpp`, where the proper fix requires collecting
> > futures in order. Currently every `collect` call spawns it's own actor,
> so
> > for two `collect` calls, even though their futures are satisfied in
> order,
> > they may finish out-of-order. So we need some libprocess changes to have
> > the ability to collect futures in the same actor. Here I have two
> > proposals:
> >
> > 1. Add a new `collect` interface that takes an actor as a parameter.
> >
> > 2. Introduce `process::Executor::collect()` for this.
> >
> > Any opinion on these two options?
> >
>


Re: Collecting futures in the same actor in libprocess

2018-03-01 Thread Benjamin Mahler
Could you explain the problem in more detail?

On Thu, Mar 1, 2018 at 12:15 PM Chun-Hung Hsiao 
wrote:

> Hi all,
>
> Meng found a bug in `slave.cpp`, where the proper fix requires collecting
> futures in order. Currently every `collect` call spawns it's own actor, so
> for two `collect` calls, even though their futures are satisfied in order,
> they may finish out-of-order. So we need some libprocess changes to have
> the ability to collect futures in the same actor. Here I have two
> proposals:
>
> 1. Add a new `collect` interface that takes an actor as a parameter.
>
> 2. Introduce `process::Executor::collect()` for this.
>
> Any opinion on these two options?
>


Collecting futures in the same actor in libprocess

2018-03-01 Thread Chun-Hung Hsiao
Hi all,

Meng found a bug in `slave.cpp`, where the proper fix requires collecting
futures in order. Currently every `collect` call spawns it's own actor, so
for two `collect` calls, even though their futures are satisfied in order,
they may finish out-of-order. So we need some libprocess changes to have
the ability to collect futures in the same actor. Here I have two proposals:

1. Add a new `collect` interface that takes an actor as a parameter.

2. Introduce `process::Executor::collect()` for this.

Any opinion on these two options?


Re: Anyone using a custom Sorter?

2018-03-01 Thread Jie Yu
if your intention is to kill sorter interface, i am +100

On Wed, Feb 28, 2018 at 2:12 PM, Michael Park  wrote:

> I'm not even sure if anyone's using a custom Allocator, but
> is anyone using a custom Sorter? It doesn't seem like there's
> even a module for it so it wouldn't be dynamically loaded.
>
> Perhaps you have a fork with a custom Sorter?
>
> Please let me know,
>
> Thanks!
>
> MPark
>


Re: Authorization Logging

2018-03-01 Thread Alexander Rojas
This is a good question on where to do the audit, should it happen in the 
authorization module itself, or in the caller. It doesn’t help that you can 
authorize using approvers or the authorizer or the not so long ago introuced 
acceptors. There are also function wrappers that help to do so. 

The feeling we have had in the past is that the authorizer interface was 
created to accomodate the needs of the people writing authorization modules but 
no so much its use inside our code base. That’s why I’ve been working in a set 
of patches to try to clean up a little bit the code that calls authorization 
based on ideas from BenH https://reviews.apache.org/r/65311/ 
 .

Reviews/comments always welcomed

Alexander Rojas
alexander.ro...@gmail.com




> On 28. Feb 2018, at 23:52, Benjamin Mahler  wrote:
> 
> When touching some code, I noticed that authorization logging is currently 
> done rather inconsistently across the call-sites and many cases do not log 
> the request:
> 
> $ grep -R -A 3 'LOG.*Authorizing' src
> 
> Should authorization logging be the concern of an authorizer implementation? 
> For audit purposes I could imagine this also being part of a separate log 
> that the authorizer maintains?
> 
> Ben