[GitHub] mesos issue #263: Allow nested containers in pods to have separate namespace...
Github user qianzhangxa commented on the issue: https://github.com/apache/mesos/pull/263 I'd like to echo @jdef's comment, we need a clear use case for ip per nested container. Our current status is, if framework launches multiple task groups (pods) via a single default executor, all the nested containers of all these task groups will share the executor's network namespace. This is actually different from Kubernetes pod where each pod will have its own network namespace and all the container in a pod will share the same network namespace so that they can communicated with 127.0.0.1/localhost. IMHO, we should consider to do something similar with Kubernetes, i.e., each task group will have its own network namespace rather than each nested container has its own network namespace unless we have a use case for it. ---
[GitHub] mesos issue #265: Update presentations.md
Github user judithpatudith commented on the issue: https://github.com/apache/mesos/pull/265 @packtpartner it looks like this content is gated, and none of the other presentations on the list are. Do you have a non-gated version to link to instead? ---
Re: Feb 21 Performance WG Meeting Canceled
One thing I'd like to follow up is the GraphQL based query support. It might fall into API workgroup better but our usage is more of reducing load from `/state` endpoint so performance could be related. On Tue, Feb 20, 2018 at 11:51 PM, Benjamin Mahlerwrote: > Hi folks, since there's nothing on the agenda for this month's meeting. I > will cancel it and plan to meet next month. If there are any topics folks > would like to discuss let me know and we can schedule one sooner! > -- Cheers, Zhitao Li
Re: Surfacing additional issues on agent host to schedulers
On Wed, Feb 21, 2018 at 11:18 AM, Zhitao Liwrote: > Hi Avinash, > > We use haproxy of all outgoing traffic. For example, if instance of service > A wants to talk to service B, what it does is actually call a > "localhost:" backed by the local haproxy instance, which then > forwards the request to some instance of service B. > > In such a situation, if local haproxy is not functional, it's almost true > that any thing making outgoing requests will not run properly, and we > prefer to drain the host. > I am assuming the local HAProxy is not run within the purview of Mesos (it could potentially be run as a stand-alone container starting Mesos 1.5)? So how would Mesos even know that there is an issue with HAProxy and boil it up? The problem here seems to be that the containers connectivity is controlled by entities outside the Mesos domain. Reporting on problems with these entities seems like a hard problem. On option I can think of is to inject command health checks for the containers that are querying the container's endpoitns through the frontends exposed by the local HAProxy. This would all the detection of any failure in HAProxy and will boiled up as a Mesos healthcheck failure?? > > On Wed, Feb 21, 2018 at 9:45 AM, Avinash Sridharan > wrote: > > > On Tue, Feb 20, 2018 at 3:54 PM, James Peach wrote: > > > > > > > > > On Feb 20, 2018, at 11:11 AM, Zhitao Li > wrote: > > > > > > > > Hi, > > > > > > > > In one of recent Mesos meet up, quite a couple of cluster operators > had > > > > expressed complaints that it is hard to model host issues with Mesos > at > > > the > > > > moment. > > > > > > > > For example, in our environment, the only signal scheduler would know > > is > > > > whether Mesos agent has disconnected from the cluster. However, we > > have a > > > > family of other issues in real production which makes the hosts > > > (sometimes > > > > "partially") unusable. Examples include: > > > > - traffic routing software malfunction (i.e, haproxy): Mesos agent > does > > > not > > > > require this so scheduler/deployment system is not aware, but actual > > > > workload on the cluster will fail; > > > > > Zhitao, could you elaborate on this a bit more? Do you mean the workloads > > are being load-balanced by HAProxy and due to misconfiguration the > > workloads are now unreachable and somehow the agent should be boiling up > > these network issues? I am guessing in your case HAProxy is somehow > > involved in providing connectivity to workloads on a given agent and > > HAProxy is actually running on that agent? > > > > > > > > - broken disk; > > > > - other long running system agent issues. > > > > > > > > This email is looking at how can Mesos recommend best practice to > > surface > > > > these issues to scheduler, and whether we need additional primitives > in > > > > Mesos to achieve such goal. > > > > > > In the K8s world the node can publish "conditions" that describe its > > status > > > > > > https://kubernetes.io/docs/concepts/architecture/nodes/# > > condition > > > > > > The condition can automatically taint the node, which could cause pods > to > > > automatically be evicted (ie. if they can't tolerate that specific > > taint). > > > > > > J > > > > > > > > > > -- > > Avinash Sridharan, Mesosphere > > +1 (323) 702 5245 > > > > > > -- > Cheers, > > Zhitao Li > -- Avinash Sridharan, Mesosphere +1 (323) 702 5245
Re: Surfacing additional issues on agent host to schedulers
Hi Avinash, We use haproxy of all outgoing traffic. For example, if instance of service A wants to talk to service B, what it does is actually call a "localhost:" backed by the local haproxy instance, which then forwards the request to some instance of service B. In such a situation, if local haproxy is not functional, it's almost true that any thing making outgoing requests will not run properly, and we prefer to drain the host. On Wed, Feb 21, 2018 at 9:45 AM, Avinash Sridharanwrote: > On Tue, Feb 20, 2018 at 3:54 PM, James Peach wrote: > > > > > > On Feb 20, 2018, at 11:11 AM, Zhitao Li wrote: > > > > > > Hi, > > > > > > In one of recent Mesos meet up, quite a couple of cluster operators had > > > expressed complaints that it is hard to model host issues with Mesos at > > the > > > moment. > > > > > > For example, in our environment, the only signal scheduler would know > is > > > whether Mesos agent has disconnected from the cluster. However, we > have a > > > family of other issues in real production which makes the hosts > > (sometimes > > > "partially") unusable. Examples include: > > > - traffic routing software malfunction (i.e, haproxy): Mesos agent does > > not > > > require this so scheduler/deployment system is not aware, but actual > > > workload on the cluster will fail; > > > Zhitao, could you elaborate on this a bit more? Do you mean the workloads > are being load-balanced by HAProxy and due to misconfiguration the > workloads are now unreachable and somehow the agent should be boiling up > these network issues? I am guessing in your case HAProxy is somehow > involved in providing connectivity to workloads on a given agent and > HAProxy is actually running on that agent? > > > > > - broken disk; > > > - other long running system agent issues. > > > > > > This email is looking at how can Mesos recommend best practice to > surface > > > these issues to scheduler, and whether we need additional primitives in > > > Mesos to achieve such goal. > > > > In the K8s world the node can publish "conditions" that describe its > status > > > > https://kubernetes.io/docs/concepts/architecture/nodes/# > condition > > > > The condition can automatically taint the node, which could cause pods to > > automatically be evicted (ie. if they can't tolerate that specific > taint). > > > > J > > > > > -- > Avinash Sridharan, Mesosphere > +1 (323) 702 5245 > -- Cheers, Zhitao Li
Re: Surfacing additional issues on agent host to schedulers
Hi James, The "condition" list you described fits our modeling pretty well, although I don't know whether the eviction is made by a scheduler or the local kubelet proxy. Do you know whether the conditions can be extended and operator can define additional conditions which is not in the provided list? On Tue, Feb 20, 2018 at 3:54 PM, James Peachwrote: > > > On Feb 20, 2018, at 11:11 AM, Zhitao Li wrote: > > > > Hi, > > > > In one of recent Mesos meet up, quite a couple of cluster operators had > > expressed complaints that it is hard to model host issues with Mesos at > the > > moment. > > > > For example, in our environment, the only signal scheduler would know is > > whether Mesos agent has disconnected from the cluster. However, we have a > > family of other issues in real production which makes the hosts > (sometimes > > "partially") unusable. Examples include: > > - traffic routing software malfunction (i.e, haproxy): Mesos agent does > not > > require this so scheduler/deployment system is not aware, but actual > > workload on the cluster will fail; > > - broken disk; > > - other long running system agent issues. > > > > This email is looking at how can Mesos recommend best practice to surface > > these issues to scheduler, and whether we need additional primitives in > > Mesos to achieve such goal. > > In the K8s world the node can publish "conditions" that describe its status > > https://kubernetes.io/docs/concepts/architecture/nodes/#condition > > The condition can automatically taint the node, which could cause pods to > automatically be evicted (ie. if they can't tolerate that specific taint). > > J -- Cheers, Zhitao Li
Re: Surfacing additional issues on agent host to schedulers
On Tue, Feb 20, 2018 at 3:54 PM, James Peachwrote: > > > On Feb 20, 2018, at 11:11 AM, Zhitao Li wrote: > > > > Hi, > > > > In one of recent Mesos meet up, quite a couple of cluster operators had > > expressed complaints that it is hard to model host issues with Mesos at > the > > moment. > > > > For example, in our environment, the only signal scheduler would know is > > whether Mesos agent has disconnected from the cluster. However, we have a > > family of other issues in real production which makes the hosts > (sometimes > > "partially") unusable. Examples include: > > - traffic routing software malfunction (i.e, haproxy): Mesos agent does > not > > require this so scheduler/deployment system is not aware, but actual > > workload on the cluster will fail; > Zhitao, could you elaborate on this a bit more? Do you mean the workloads are being load-balanced by HAProxy and due to misconfiguration the workloads are now unreachable and somehow the agent should be boiling up these network issues? I am guessing in your case HAProxy is somehow involved in providing connectivity to workloads on a given agent and HAProxy is actually running on that agent? > > - broken disk; > > - other long running system agent issues. > > > > This email is looking at how can Mesos recommend best practice to surface > > these issues to scheduler, and whether we need additional primitives in > > Mesos to achieve such goal. > > In the K8s world the node can publish "conditions" that describe its status > > https://kubernetes.io/docs/concepts/architecture/nodes/#condition > > The condition can automatically taint the node, which could cause pods to > automatically be evicted (ie. if they can't tolerate that specific taint). > > J -- Avinash Sridharan, Mesosphere +1 (323) 702 5245
FINAL REMINDER: CFP for Apache EU Roadshow Closes 25th February
Hello Apache Supporters and Enthusiasts This is your FINAL reminder that the Call for Papers (CFP) for the Apache EU Roadshow is closing soon. Our Apache EU Roadshow will focus on Cloud, IoT, Apache Tomcat, Apache Http and will run from 13-14 June 2018 in Berlin. Note that the CFP deadline has been extended to *25*^*th* *February *and it will be your final opportunity to submit a talk for thisevent. Please make your submissions at http://apachecon.com/euroadshow18/ Also note that early bird ticket registrations to attend FOSS Backstage including the Apache EU Roadshow, have also been extended and will be available until 23^rd February. Please register at https://foss-backstage.de/tickets We look forward to seeing you in Berlin! Thanks Sharan Foga, VP Apache Community Development PLEASE NOTE: You are receiving this message because you are subscribed to a user@ or dev@ list of one or more Apache Software Foundation projects.
[GitHub] mesos pull request #265: Update presentations.md
GitHub user packtpartner opened a pull request: https://github.com/apache/mesos/pull/265 Update presentations.md Added a new video listing You can merge this pull request into a Git repository by running: $ git pull https://github.com/packtpartner/mesos patch-5 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/mesos/pull/265.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #265 commit 28716c734e7f9a9415e708c8bec49387177f2400 Author: PacktDate: 2018-02-21T10:44:35Z Update presentations.md ---