[GitHub] mesos issue #263: Allow nested containers in pods to have separate namespace...

2018-02-21 Thread qianzhangxa
Github user qianzhangxa commented on the issue:

https://github.com/apache/mesos/pull/263
  
I'd like to echo @jdef's comment, we need a clear use case for ip per 
nested container. Our current status is, if framework launches multiple task 
groups (pods) via a single default executor, all the nested containers of all 
these task groups will share the executor's network namespace. This is actually 
different from Kubernetes pod where each pod will have its own network 
namespace and all the container in a pod will share the same network namespace 
so that they can communicated with 127.0.0.1/localhost. IMHO, we should 
consider to do something similar with Kubernetes, i.e., each task group will 
have its own network namespace rather than each nested container has its own 
network namespace unless we have a use case for it.


---


[GitHub] mesos issue #265: Update presentations.md

2018-02-21 Thread judithpatudith
Github user judithpatudith commented on the issue:

https://github.com/apache/mesos/pull/265
  
@packtpartner it looks like this content is gated, and none of the other 
presentations on the list are. Do you have a non-gated version to link to 
instead?


---


Re: Feb 21 Performance WG Meeting Canceled

2018-02-21 Thread Zhitao Li
One thing I'd like to follow up is the GraphQL based query support. It
might fall into API workgroup better but our usage is more of reducing load
from `/state` endpoint so performance could be related.

On Tue, Feb 20, 2018 at 11:51 PM, Benjamin Mahler 
wrote:

> Hi folks, since there's nothing on the agenda for this month's meeting. I
> will cancel it and plan to meet next month. If there are any topics folks
> would like to discuss let me know and we can schedule one sooner!
>



-- 
Cheers,

Zhitao Li


Re: Surfacing additional issues on agent host to schedulers

2018-02-21 Thread Avinash Sridharan
On Wed, Feb 21, 2018 at 11:18 AM, Zhitao Li  wrote:

> Hi Avinash,
>
> We use haproxy of all outgoing traffic. For example, if instance of service
> A wants to talk to service B, what it does is actually call a
> "localhost:" backed by the local haproxy instance, which then
> forwards the request to some instance of service B.
>
> In such a situation, if local haproxy is not functional, it's almost true
> that any thing making outgoing requests will not run properly, and we
> prefer to drain the host.
>

I am assuming the local HAProxy is not run within the purview of Mesos (it
could potentially be run as a stand-alone container starting Mesos 1.5)? So
how would Mesos even know that there is an issue with HAProxy and boil it
up? The problem here seems to be that the containers connectivity is
controlled by entities outside the Mesos domain. Reporting on problems with
these entities seems like a hard problem.

On option I can think of is to inject command health checks for the
containers that are querying the container's endpoitns through the
frontends exposed by the local HAProxy. This would all the detection of any
failure in HAProxy and will boiled up as a Mesos healthcheck failure??

>
> On Wed, Feb 21, 2018 at 9:45 AM, Avinash Sridharan 
> wrote:
>
> > On Tue, Feb 20, 2018 at 3:54 PM, James Peach  wrote:
> >
> > >
> > > > On Feb 20, 2018, at 11:11 AM, Zhitao Li 
> wrote:
> > > >
> > > > Hi,
> > > >
> > > > In one of recent Mesos meet up, quite a couple of cluster operators
> had
> > > > expressed complaints that it is hard to model host issues with Mesos
> at
> > > the
> > > > moment.
> > > >
> > > > For example, in our environment, the only signal scheduler would know
> > is
> > > > whether Mesos agent has disconnected from the cluster. However, we
> > have a
> > > > family of other issues in real production which makes the hosts
> > > (sometimes
> > > > "partially") unusable. Examples include:
> > > > - traffic routing software malfunction (i.e, haproxy): Mesos agent
> does
> > > not
> > > > require this so scheduler/deployment system is not aware, but actual
> > > > workload on the cluster will fail;
> > >
> > Zhitao, could you elaborate on this a bit more? Do you mean the workloads
> > are being load-balanced by HAProxy and due to misconfiguration the
> > workloads are now unreachable and somehow the agent should be boiling up
> > these network issues? I am guessing in your case HAProxy is somehow
> > involved in providing connectivity to workloads on a given agent and
> > HAProxy is actually running on that agent?
> >
> >
> > > > - broken disk;
> > > > - other long running system agent issues.
> > > >
> > > > This email is looking at how can Mesos recommend best practice to
> > surface
> > > > these issues to scheduler, and whether we need additional primitives
> in
> > > > Mesos to achieve such goal.
> > >
> > > In the K8s world the node can publish "conditions" that describe its
> > status
> > >
> > > https://kubernetes.io/docs/concepts/architecture/nodes/#
> > condition
> > >
> > > The condition can automatically taint the node, which could cause pods
> to
> > > automatically be evicted (ie. if they can't tolerate that specific
> > taint).
> > >
> > > J
> >
> >
> >
> >
> > --
> > Avinash Sridharan, Mesosphere
> > +1 (323) 702 5245
> >
>
>
>
> --
> Cheers,
>
> Zhitao Li
>



-- 
Avinash Sridharan, Mesosphere
+1 (323) 702 5245


Re: Surfacing additional issues on agent host to schedulers

2018-02-21 Thread Zhitao Li
Hi Avinash,

We use haproxy of all outgoing traffic. For example, if instance of service
A wants to talk to service B, what it does is actually call a
"localhost:" backed by the local haproxy instance, which then
forwards the request to some instance of service B.

In such a situation, if local haproxy is not functional, it's almost true
that any thing making outgoing requests will not run properly, and we
prefer to drain the host.

On Wed, Feb 21, 2018 at 9:45 AM, Avinash Sridharan 
wrote:

> On Tue, Feb 20, 2018 at 3:54 PM, James Peach  wrote:
>
> >
> > > On Feb 20, 2018, at 11:11 AM, Zhitao Li  wrote:
> > >
> > > Hi,
> > >
> > > In one of recent Mesos meet up, quite a couple of cluster operators had
> > > expressed complaints that it is hard to model host issues with Mesos at
> > the
> > > moment.
> > >
> > > For example, in our environment, the only signal scheduler would know
> is
> > > whether Mesos agent has disconnected from the cluster. However, we
> have a
> > > family of other issues in real production which makes the hosts
> > (sometimes
> > > "partially") unusable. Examples include:
> > > - traffic routing software malfunction (i.e, haproxy): Mesos agent does
> > not
> > > require this so scheduler/deployment system is not aware, but actual
> > > workload on the cluster will fail;
> >
> Zhitao, could you elaborate on this a bit more? Do you mean the workloads
> are being load-balanced by HAProxy and due to misconfiguration the
> workloads are now unreachable and somehow the agent should be boiling up
> these network issues? I am guessing in your case HAProxy is somehow
> involved in providing connectivity to workloads on a given agent and
> HAProxy is actually running on that agent?
>
>
> > > - broken disk;
> > > - other long running system agent issues.
> > >
> > > This email is looking at how can Mesos recommend best practice to
> surface
> > > these issues to scheduler, and whether we need additional primitives in
> > > Mesos to achieve such goal.
> >
> > In the K8s world the node can publish "conditions" that describe its
> status
> >
> > https://kubernetes.io/docs/concepts/architecture/nodes/#
> condition
> >
> > The condition can automatically taint the node, which could cause pods to
> > automatically be evicted (ie. if they can't tolerate that specific
> taint).
> >
> > J
>
>
>
>
> --
> Avinash Sridharan, Mesosphere
> +1 (323) 702 5245
>



-- 
Cheers,

Zhitao Li


Re: Surfacing additional issues on agent host to schedulers

2018-02-21 Thread Zhitao Li
Hi James,

The "condition" list you described fits our modeling pretty well, although
I don't know whether the eviction is made by a scheduler or the local
kubelet proxy.

Do you know whether the conditions can be extended and operator can define
additional conditions which is not in the provided list?

On Tue, Feb 20, 2018 at 3:54 PM, James Peach  wrote:

>
> > On Feb 20, 2018, at 11:11 AM, Zhitao Li  wrote:
> >
> > Hi,
> >
> > In one of recent Mesos meet up, quite a couple of cluster operators had
> > expressed complaints that it is hard to model host issues with Mesos at
> the
> > moment.
> >
> > For example, in our environment, the only signal scheduler would know is
> > whether Mesos agent has disconnected from the cluster. However, we have a
> > family of other issues in real production which makes the hosts
> (sometimes
> > "partially") unusable. Examples include:
> > - traffic routing software malfunction (i.e, haproxy): Mesos agent does
> not
> > require this so scheduler/deployment system is not aware, but actual
> > workload on the cluster will fail;
> > - broken disk;
> > - other long running system agent issues.
> >
> > This email is looking at how can Mesos recommend best practice to surface
> > these issues to scheduler, and whether we need additional primitives in
> > Mesos to achieve such goal.
>
> In the K8s world the node can publish "conditions" that describe its status
>
> https://kubernetes.io/docs/concepts/architecture/nodes/#condition
>
> The condition can automatically taint the node, which could cause pods to
> automatically be evicted (ie. if they can't tolerate that specific taint).
>
> J




-- 
Cheers,

Zhitao Li


Re: Surfacing additional issues on agent host to schedulers

2018-02-21 Thread Avinash Sridharan
On Tue, Feb 20, 2018 at 3:54 PM, James Peach  wrote:

>
> > On Feb 20, 2018, at 11:11 AM, Zhitao Li  wrote:
> >
> > Hi,
> >
> > In one of recent Mesos meet up, quite a couple of cluster operators had
> > expressed complaints that it is hard to model host issues with Mesos at
> the
> > moment.
> >
> > For example, in our environment, the only signal scheduler would know is
> > whether Mesos agent has disconnected from the cluster. However, we have a
> > family of other issues in real production which makes the hosts
> (sometimes
> > "partially") unusable. Examples include:
> > - traffic routing software malfunction (i.e, haproxy): Mesos agent does
> not
> > require this so scheduler/deployment system is not aware, but actual
> > workload on the cluster will fail;
>
Zhitao, could you elaborate on this a bit more? Do you mean the workloads
are being load-balanced by HAProxy and due to misconfiguration the
workloads are now unreachable and somehow the agent should be boiling up
these network issues? I am guessing in your case HAProxy is somehow
involved in providing connectivity to workloads on a given agent and
HAProxy is actually running on that agent?


> > - broken disk;
> > - other long running system agent issues.
> >
> > This email is looking at how can Mesos recommend best practice to surface
> > these issues to scheduler, and whether we need additional primitives in
> > Mesos to achieve such goal.
>
> In the K8s world the node can publish "conditions" that describe its status
>
> https://kubernetes.io/docs/concepts/architecture/nodes/#condition
>
> The condition can automatically taint the node, which could cause pods to
> automatically be evicted (ie. if they can't tolerate that specific taint).
>
> J




-- 
Avinash Sridharan, Mesosphere
+1 (323) 702 5245


FINAL REMINDER: CFP for Apache EU Roadshow Closes 25th February

2018-02-21 Thread Sharan F

Hello Apache Supporters and Enthusiasts

This is your FINAL reminder that the Call for Papers (CFP) for the 
Apache EU Roadshow is closing soon. Our Apache EU Roadshow will focus on 
Cloud, IoT, Apache Tomcat, Apache Http and will run from 13-14 June 2018 
in Berlin.
Note that the CFP deadline has been extended to *25*^*th* *February *and 
it will be your final opportunity to submit a talk for thisevent.


Please make your submissions at http://apachecon.com/euroadshow18/

Also note that early bird ticket registrations to attend FOSS Backstage 
including the Apache EU Roadshow, have also been extended and will be 
available until 23^rd February. Please register at 
https://foss-backstage.de/tickets


We look forward to seeing you in Berlin!

Thanks
Sharan Foga, VP Apache Community Development

PLEASE NOTE: You are receiving this message because you are subscribed 
to a user@ or dev@ list of one or more Apache Software Foundation projects.




[GitHub] mesos pull request #265: Update presentations.md

2018-02-21 Thread packtpartner
GitHub user packtpartner opened a pull request:

https://github.com/apache/mesos/pull/265

Update presentations.md

Added a new video listing

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/packtpartner/mesos patch-5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mesos/pull/265.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #265


commit 28716c734e7f9a9415e708c8bec49387177f2400
Author: Packt 
Date:   2018-02-21T10:44:35Z

Update presentations.md




---