Re: Welcome James Peach as a new committer and PMC memeber!

2017-09-11 Thread Sam
Congrats, way to go 


Regards,
Sam Chen | APJ Country Director | DC/OS Evangelist
>   Build and run modern apps
> at scale using DC/OS


> On Sep 7, 2017, at 5:08 AM, Yan Xu  wrote:
> 
> Hi Mesos devs and users,
> 
> Please welcome James Peach as a new Apache Mesos committer and PMC member.
> 
> James has been an active contributor to Mesos for over two years now. He has 
> made many great contributions to the project which include XFS disk isolator, 
> improvement to Linux capabilities support and IPC namespace isolator. He's 
> super active on the mailing lists and slack channels, always eager to help 
> folks in the community and he has been helping with a lot of Mesos reviews as 
> well.
> 
> Here is his formal committer candidate checklist:
> 
> https://docs.google.com/document/d/19G5zSxhrRBdS6GXn9KjCznjX3cp0mUbck6Jy1Hgn3RY/edit?usp=sharing
>  
> 
> Congrats James!
> 
> Yan
> 


Re: Adding process::Executor::execute()

2017-09-11 Thread Benjamin Hindman
Quick clarification: you'll have a single `process::Executor` and queue up
all the rmdirs on that, correct? So you'll still tie up a worker thread,
but only one of them.

Either way it makes sense to add `process::Executor::execute()`. I'm happy
to shepherd that for you Chun, send me a patch!

On Mon, Sep 11, 2017 at 7:32 PM, Chun-Hung Hsiao 
wrote:

> Hi,
>
> I'm thinking about extending `process::Executor` with a new `execute()`
> interface.
> The need of this new interface surfaced when I'm working on
> https://issues.apache.org/jira/browse/MESOS-7964
> Summary:
> 1. A disk GC might execute multiple `rmdirs` callbacks, and some of them
> are heavy duty. We don't want to run them on `GarbageCollectorProcess` so
> that it won't block other events of the process.
> Currently we do the following:
> async(rmdirs).onAny(...);
> 2. `async` puts each `rmdir` callback in an actor. When there are many
> heavy-duty `rmdirs` callbacks, the actors end up occupying all worker
> threads and blocking other actors for minutes.
>
> Yan suggested me to use `process::Executor` such that:
> 1. The `rmdirs` callbacks are not executed on `GarbaceGollectorProcess`
> 2. All `rmdirs` callbacks are executed on a single thread
> Since the `Executor` class only contains a `defer()` function that returns
> a `_Deferred` structure,
> I'm doing the following:
> executor.defer(rmdirs).operator std::function )>()().onAny(...)
>
> Would it make sense to add another `execute()` function to directly return
> a `Future`?
>
> - Chun-Hung
>
>


-- 
Benjamin Hindman
Founder of Mesosphere and Co-Creator of Apache Mesos
Mesosphere Inc.  

Follow us on Twitter: @mesosphere 

good evening

2017-09-11 Thread Incia Anand
Im trying to study mesos for my project and i need to create some tasks and
do a health check on them. can you tell me the exact waay i should create
the tasks to do so


Re: Catching the webui up to features

2017-09-11 Thread Aaron Wood
Hey Ben,

There was a change sometime after Mesos 1.0.x (I think) that altered how
the leader state was obtained
https://github.com/apache/mesos/blob/master/src/webui/master/static/js/controllers.js#L362-L364

This seems to be an intended change but this causes UI error modals to
continually pop up stating that the leader could not be reached when
accessing the UI over a tunnel. The errors also show when you try to access
agent info among other things throughout the angular app.

I'm curious, how many people using Mesos access the UI over a tunnel? Is
there any harm in reverting this functionality to how it was in the 1.0.x
days? I'm sure there was some important reason as to why this change was
made, I'm just not aware of it. From my understanding many of the ops folks
in our org use the UI but almost always over a tunnel. Once more groups in
the org upgrade to newer versions of Mesos they will be facing this issue
almost on a daily basis.

Thanks,
Aaron

On Mon, Sep 11, 2017 at 2:59 PM, Benjamin Mahler  wrote:

> Hi folks,
>
> Over time the webui has lagged behind for some of the features that have
> been added. I'm currently tracking what's required to catch it up here:
>
> https://issues.apache.org/jira/browse/MESOS-6440
>
> If you know of other features that make sense to display in the webui, feel
> free to file a ticket under this epic (or link it as related if it falls
> under a different epic) and let me know. For example, I just filed another
> one within it for displaying task health information.
>
> Also feel free to make contributions to the webui even if you don't feel
> that you're knowledgable on the frontend side of things. The majority of
> webui changes are very easy and provide a lot of value to users who
> interact with it on a regular basis!
>
> If you'd like to contribute to the webui, there are a lot of easy tickets
> to get started with, here is one example that I would be happy to assist
> with: https://issues.apache.org/jira/browse/MESOS-7962
>
> Thanks!
> Ben
>


About the Mesos authorization

2017-09-11 Thread j...@is-land.com.tw
Hi all:
Why does Mesos authorization not support the LDAP or Kerberos?

I am thinking to implement the Mesos module for authorization.


Thank you.


Adding process::Executor::execute()

2017-09-11 Thread Chun-Hung Hsiao
Hi,

I'm thinking about extending `process::Executor` with a new `execute()`
interface.
The need of this new interface surfaced when I'm working on
https://issues.apache.org/jira/browse/MESOS-7964
Summary:
1. A disk GC might execute multiple `rmdirs` callbacks, and some of them
are heavy duty. We don't want to run them on `GarbageCollectorProcess` so
that it won't block other events of the process.
Currently we do the following:
async(rmdirs).onAny(...);
2. `async` puts each `rmdir` callback in an actor. When there are many
heavy-duty `rmdirs` callbacks, the actors end up occupying all worker
threads and blocking other actors for minutes.

Yan suggested me to use `process::Executor` such that:
1. The `rmdirs` callbacks are not executed on `GarbaceGollectorProcess`
2. All `rmdirs` callbacks are executed on a single thread
Since the `Executor` class only contains a `defer()` function that returns
a `_Deferred` structure,
I'm doing the following:
executor.defer(rmdirs).operator
std::function()().onAny(...)

Would it make sense to add another `execute()` function to directly return
a `Future`?

- Chun-Hung


Re: Kubernetes/Mesos? Yes? No? When?

2017-09-11 Thread j...@is-land.com.tw


On 2017-08-20 05:43, butech tech  wrote: 
> Hello!
> 
> What is the status of this integration? We decided to adopt Mesos as a core
> component of our infrastructure and we have recently found out that our
> users have significant need for Kubernetes. Being able to support
> Kubernetes on Mesos is crucial to us.
> 
> Thanks,
> Bubutech
> 
Use the kube-mesos-framework for running Kubernets on Mesos

For more information please refer to:
https://github.com/kubernetes-incubator/kube-mesos-framework


Catching the webui up to features

2017-09-11 Thread Benjamin Mahler
Hi folks,

Over time the webui has lagged behind for some of the features that have
been added. I'm currently tracking what's required to catch it up here:

https://issues.apache.org/jira/browse/MESOS-6440

If you know of other features that make sense to display in the webui, feel
free to file a ticket under this epic (or link it as related if it falls
under a different epic) and let me know. For example, I just filed another
one within it for displaying task health information.

Also feel free to make contributions to the webui even if you don't feel
that you're knowledgable on the frontend side of things. The majority of
webui changes are very easy and provide a lot of value to users who
interact with it on a regular basis!

If you'd like to contribute to the webui, there are a lot of easy tickets
to get started with, here is one example that I would be happy to assist
with: https://issues.apache.org/jira/browse/MESOS-7962

Thanks!
Ben


Re: [Design Doc] Native Support for Prometheus Metrics

2017-09-11 Thread James Peach

> On Sep 9, 2017, at 5:29 AM, Benjamin Bannier  
> wrote:
> 
> Hi James,
> 
> I'd like to make a longer comment here to make it easier to discuss.
> 
>> [...]
>> 
>> Note the proposal to alter how Timer metrics are exposed in an incompatible
>> way (I argue this is OK because you can't really make use of these metrics
>> now).
> 
> I am not sure I follow your argument around `Timer`. It is similar to a gauge
> caching the last value and an associated statistics calculated from a time 
> series.

I'm arguing that this does not provide useful semantics. When we think about 
the real-world objects we are representing with Timers, they don't look at all 
like what we represent with Gauges. For example, if I'm asked how much disk 
space is free, giving an instantaneous value with no reference to prior state 
(ie. Gauge) is informative and useful. Conversely, if I was asked to bill for 
my work time over the last month and I handed you a bill for the 10min because 
that was the last interval I worked, that answer is seriously unhelpful.

> I have never used Prometheus, but a brief look at the Prometheus
> docs seems to suggest that a `Timer` could be mapped onto a Prometheus summary
> type with minimal modifications (namely, by adding a `sum` value that you
> propose as sole replacement).

Right, that's what the current implementation does.

> I believe that exposing statistics is useful, and moving all `Timer` metrics 
> to
> counters (cumulative value and number of samples) would leads to information
> loss.

I'm not proposing that we remove the Timer statistics. I am, however, proposing 
that representing a Timer as a cumulative count of elapsed time units makes it 
possible to actually use Timers for practical purposes. When we plotted the 
allocation_run Timer, would see the difference between full and partial 
allocation runs by the area under the graph. We could see the difference over 
time and we could even see how allocation runs behave across failover.

> Since most of your criticism of `Timer` is about it its associated statistics,

That wasn't my intention. The problem with the Timer is the value and count 
fields. While I did mention that I think a raw histogram would be more useful, 
I explicitly put that out of scope.

> maybe we can make fixes to libprocess' `TimeSeries` and the derived
> `Statistics` to make them more usable. Right now `Statistics` seems to be more
> apt for dealing with timing measurements where one probably worries more about
> the long tail of the distribution (it only exposes the median and higher
> percentiles). It seems that if one would e.g., make the exposed percentiles
> configurable, it should be possible to expose a useful characterization of the
> underlying distribution (think: box plot). It might be that one would need to
> revisit how `TimeSeries` sparsifies older data to make sure the quantiles we
> expose are meaningful.

I agree that is is possible to measure and improve the statistics. Probably how 
I'd approach this is to add extra instrumentation to capture all the raw Timer 
observations. Then I would attempt to show that the running percentile summary 
approximates the actual percentiles measured from the complete data.

>> First, note that the “allocator/mesos/allocation_run_ms/count” sample is not
>> useful at all. It has the semantics of a saturating counter that saturates at
>> the size of the bounded time series. To address this, there is another metric
>> “allocator/mesos/allocation_runs”, which tracks the actual count of
>> allocation runs (3161331.00 in this case). If you plot this counter over time
>> (ie. as a rate), it will be zero for all time once it reaches saturation. In
>> the case of allocation runs, this is almost all the time, since 1000
>> allocations will be performed within a few hours.
> 
> While `count` is not a useful measure of the behavior of the measured datum, 
> it
> is critical to assess whether the derived statistic is meaningful (sample
> size). Like you write, it becomes less interesting once enough data was
> collected.

If the count doesn't saturate, it is always meaningful. If it is possible for 
the metric to become non-meaningful, that's pretty bad. I'm not sure I accept 
your premise here, though. Once the count saturates at 1000 samples, how do you 
know whether the statistics are for the last hour, or 3 hours ago? It is 
possible to accumulate no samples and for that to be invisible in the metrics.

>> Finally, while the derived statistics metrics can be informative, they are
>> actually less expressive than a raw histogram would be. A raw histogram of
>> timed values would allow an observer to distinguish cases where there are
>> clear performance bands (e.g. when allocation times cluster at either 15ms or
>> 200ms), but the percentile statistics obscure this information.
> 
> I would argue that is more a problem of `Statistics` only reporting 
> percentiles
> from the far out,