[jira] [Created] (MESOS-8933) Stop sending offers from agents in draining mode
Sagar Sadashiv Patwardhan created MESOS-8933: Summary: Stop sending offers from agents in draining mode Key: MESOS-8933 URL: https://issues.apache.org/jira/browse/MESOS-8933 Project: Mesos Issue Type: Improvement Reporter: Sagar Sadashiv Patwardhan *Background:* At Yelp, we use mesos to run microservices(marathon), batch jobs(chronos and custom frameworks), spark(spark mesos framework) etc. We also autoscale the number of agents in our cluster based on the current demand and some other metrics. We use mesos maintenance primitives to gracefully shut down mesos agents. *Problem:* When we want to shut down an agent for some reason, we first move the agent into draining mode. This allows us to gracefully terminate the micro-services and other tasks. But, mesos continues to send offers from that agent with unavailability set. Frameworks such as marathon, chronos and spark ignore the unavailability and schedule the tasks on the agent. To prevent this from happening, we allocate all the available resources on that agent to maintenance role. But, this approach is not fool-proof. There is still a race condition between when we move the agent into draining mode and when we allocate all the available resources on the agent to maintenance role. *Proposal:* It would be nice if mesos stops sending offers from the agents in draining mode. Something like this: [https://gist.github.com/sagar8192/0b9dbccc908818f8f9f5a18d1f634513] I don't know if this affects the allocator or not. We can put this behind a flag(something like --do-not-send-ffers-from-agents-in-draining-mode) and make it optional. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8534) Allow nested containers in TaskGroups to have separate network namespaces
[ https://issues.apache.org/jira/browse/MESOS-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373786#comment-16373786 ] Sagar Sadashiv Patwardhan commented on MESOS-8534: -- Hi Qian Zhang, the reason behind using pods is they provide atomicity(all or nothing semantics). We want either all the containers to start or none of them to start. If we put the containers in different pods, we will not be able to use all-or-nothing sematic of pods. We had some discussion about this in today's containerizer WG. Please check it out when you get a chance. I think it will help clarify your questions. > Allow nested containers in TaskGroups to have separate network namespaces > - > > Key: MESOS-8534 > URL: https://issues.apache.org/jira/browse/MESOS-8534 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Sagar Sadashiv Patwardhan >Priority: Minor > Labels: cni > > As per the discussion with [~jieyu] and [~avinash.mesos] , I am going to > allow nested containers in TaskGroups to have separate namespaces. I am also > going to retain the existing functionality, where nested containers can share > namespaces with the parent/root container. > *Use case:* At Yelp, we have this application called seagull that runs > multiple tasks in parallel. It is mainly used for running tests that depend > on other containerized internal microservices. It was developed before mesos > had support for docker-executor. So, it uses a custom executor, which > directly talks to docker daemon on the host and run a bunch of service > containers along with the process where tests are executed. Resources for all > these containers are not accounted for in mesos. Clean-up of these containers > is also a headache. We have a tool called docker-reaper that automatically > reaps the orphaned containers once the executor goes away. In addition to > that, we also run a few cron jobs that clean-up any leftover containers. > We are in the process of containerizing the process that runs the tests. We > also want to delegate the responsibility of lifecycle management of docker > containers to mesos and get rid of the custom executor. We looked at a few > alternatives to do this and decided to go with pods because they provide > all-or-nothing(atomicity) semantics that we need for our application. But, we > cannot use pods directly because all the containers in a pod have the same > network namespace. The service discovery mechanism requires all the > containers to have separate IPs. All of our microservices bind to > container port, so we will have port collision unless we are giving separate > namespaces to all the containers in a pod. > *Proposal:* I am planning to allow nested containers to have separate > namespaces. If NetworkInfo protobuf for nested containers is not empty, then > we will assign separate mnt and network namespaces to the nested containers. > Otherwise, they will share the network and mount namepsaces with the > parent/root container. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8534) Allow nested containers in TaskGroups to have separate network namespaces
[ https://issues.apache.org/jira/browse/MESOS-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370501#comment-16370501 ] Sagar Sadashiv Patwardhan commented on MESOS-8534: -- [~alexr] We can document this behavior and add a check to validate that HTTP/TCP is not set for nested containers that are requesting separate namespaces(have NetworkInfos set in their ContainerInfo). If people are interested in using HTTP health check, then we ask them to use command check with `curl` instead, since both of them are equivalent. I will open a follow-up ticket to fix HTTP and TCP health checks. We just need to find the PID of the container and clone the health check process with namespaces of the target nested container. > Allow nested containers in TaskGroups to have separate network namespaces > - > > Key: MESOS-8534 > URL: https://issues.apache.org/jira/browse/MESOS-8534 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Sagar Sadashiv Patwardhan >Priority: Minor > Labels: cni > > As per the discussion with [~jieyu] and [~avinash.mesos] , I am going to > allow nested containers in TaskGroups to have separate namespaces. I am also > going to retain the existing functionality, where nested containers can share > namespaces with parent/root container. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (MESOS-8534) Allow nested containers in TaskGroups to have separate network namespaces
[ https://issues.apache.org/jira/browse/MESOS-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362737#comment-16362737 ] Sagar Sadashiv Patwardhan edited comment on MESOS-8534 at 2/16/18 1:16 AM: --- [~alexr] Yes, this will affect both HTTP and TCP healthchecks. Let me figure what can be done to retain the existing functionality. was (Author: sagar8192): [~alexr] Yes, I think this will affect both HTTP and TCP healthchecks. Let me figure what can be done to retain the existing functionality. > Allow nested containers in TaskGroups to have separate network namespaces > - > > Key: MESOS-8534 > URL: https://issues.apache.org/jira/browse/MESOS-8534 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Sagar Sadashiv Patwardhan >Priority: Minor > Labels: cni > > As per the discussion with [~jieyu] and [~avinash.mesos] , I am going to > allow nested containers in TaskGroups to have separate namespaces. I am also > going to retain the existing functionality, where nested containers can share > namespaces with parent/root container. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8534) Allow nested containers in TaskGroups to have separate network namespaces
[ https://issues.apache.org/jira/browse/MESOS-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366490#comment-16366490 ] Sagar Sadashiv Patwardhan commented on MESOS-8534: -- I discussed this with [~jieyu] today. Making TCP and HTTP healthchecks work is not straightforward and will require a lot of work. He suggested that we can use command check instead. Command check for nested containers already executes commands under the target nested container namespaces. So, we can use this `curl 127.0.0.1:` instead of HTTP healthcheck. This solution works for our use case. > Allow nested containers in TaskGroups to have separate network namespaces > - > > Key: MESOS-8534 > URL: https://issues.apache.org/jira/browse/MESOS-8534 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Sagar Sadashiv Patwardhan >Priority: Minor > Labels: cni > > As per the discussion with [~jieyu] and [~avinash.mesos] , I am going to > allow nested containers in TaskGroups to have separate namespaces. I am also > going to retain the existing functionality, where nested containers can > connect to parent/root containers namespace. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8534) Allow nested containers in TaskGroups to have separate network namespaces
[ https://issues.apache.org/jira/browse/MESOS-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362737#comment-16362737 ] Sagar Sadashiv Patwardhan commented on MESOS-8534: -- [~alexr] Yes, I think this will affect both HTTP and TCP healthchecks. Let me figure what can be done to retain the existing functionality. > Allow nested containers in TaskGroups to have separate network namespaces > - > > Key: MESOS-8534 > URL: https://issues.apache.org/jira/browse/MESOS-8534 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Sagar Sadashiv Patwardhan >Priority: Minor > Labels: cni > > As per the discussion with [~jieyu] and [~avinash.mesos] , I am going to > allow nested containers in TaskGroups to have separate namespaces. I am also > going to retain the existing functionality, where nested containers can > connect to parent/root containers namespace. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-8534) Allow nested containers in TaskGroups to have separate network namespaces
Sagar Sadashiv Patwardhan created MESOS-8534: Summary: Allow nested containers in TaskGroups to have separate network namespaces Key: MESOS-8534 URL: https://issues.apache.org/jira/browse/MESOS-8534 Project: Mesos Issue Type: Task Reporter: Sagar Sadashiv Patwardhan This is a placeholder. I will fill in more details after I have them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (MESOS-7882) Mesos master rescinds all the in-flight offers from all the registered agents when a new maintenance schedule is posted for a subset of slaves
[ https://issues.apache.org/jira/browse/MESOS-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sadashiv Patwardhan updated MESOS-7882: - Description: We are running mesos 1.1.0 in production. We use a custom autoscaler for scaling our mesos cluster up and down. While scaling down the cluster, autoscaler makes a POST request to mesos master /maintenance/schedule endpoint with a set of slaves to move to maintenance mode. This forces mesos master to rescind all the in-flight offers from *all the slaves* in the cluster. If our scheduler accepts one of these offers, then we get a TASK_LOST status update back for that task. We also see such (https://gist.github.com/sagar8192/8858e7cb59a23e8e1762a27571824118) log lines in mesos master logs. After reading the code(refs: https://github.com/apache/mesos/blob/master/src/master/master.cpp#L6772), it appears that offers are getting rescinded for all the slaves. I am not sure what is the expected behavior here, but it makes more sense if only resources from slaves marked for maintenance are reclaimed. *Experiment:* To verify that it is actually happening, I checked out the master branch(sha: a31dd52ab71d2a529b55cd9111ec54acf7550ded ) and added some log lines(https://gist.github.com/sagar8192/42ca055720549c5ff3067b1e6c7c68b3). Built the binary and started a mesos master and 2 agent processes. Used a basic python framework that launches docker containers on these slaves. Verified that there is no existing schedule for any slaves using `curl 10.40.19.239:5050/maintenance/status`. Posted maintenance schedule for one of the slaves(https://gist.github.com/sagar8192/fb65170240dd32a53f27e6985c549df0) after starting the mesos framework. *Logs:* mesos-master: https://gist.github.com/sagar8192/91888419fdf8284e33ebd58351131203 mesos-slave1: https://gist.github.com/sagar8192/3a83364b1f5ffc63902a80c728647f31 mesos-slave2: https://gist.github.com/sagar8192/1b341ef2271dde11d276974a27109426 Mesos framework: https://gist.github.com/sagar8192/bcd4b37dba03bde0a942b5b972004e8a I think mesos should rescind offers and inverse offers only for those slaves that are marked for maintenance(draining mode). was: We are running mesos 1.1.0 in production. We use a custom autoscaler for scaling our mesos cluster up and down. While scaling down the cluster, autoscaler makes a POST request to mesos master /maintenance/schedule endpoint with a set of slaves to move to maintenance mode. This forces mesos master to rescind all the in-flight offers from *all the slaves* in the cluster. If our scheduler accepts one of these offers, then we get a TASK_LOST status update back for that task. We also see such (https://gist.github.com/sagar8192/8858e7cb59a23e8e1762a27571824118) log lines in mesos master logs. After reading the code(refs: https://github.com/apache/mesos/blob/master/src/master/master.cpp#L6772), it appears that offers are getting rescinded for all the slaves. I am not sure what is the expected behavior here, but it makes more sense if only resources from slaves marked for maintenance are reclaimed. Experiment: To verify that it is actually happening, I checked out the master branch(sha: a31dd52ab71d2a529b55cd9111ec54acf7550ded ) and added some log lines(https://gist.github.com/sagar8192/42ca055720549c5ff3067b1e6c7c68b3). Built the binary and started a mesos master and 2 agent processes. Used a basic python framework that launches docker containers on these slaves. Verified that there is no existing schedule for any slaves using `curl 10.40.19.239:5050/maintenance/status`. Posted maintenance schedule for one of the slaves(https://gist.github.com/sagar8192/fb65170240dd32a53f27e6985c549df0) after starting the mesos framework. Logs: mesos-master: https://gist.github.com/sagar8192/91888419fdf8284e33ebd58351131203 mesos-slave1: https://gist.github.com/sagar8192/3a83364b1f5ffc63902a80c728647f31 mesos-slave2: https://gist.github.com/sagar8192/1b341ef2271dde11d276974a27109426 Mesos framework: https://gist.github.com/sagar8192/bcd4b37dba03bde0a942b5b972004e8a I think mesos should rescind offers and inverse offers only for those slaves that are marked for maintenance(draining mode). > Mesos master rescinds all the in-flight offers from all the registered agents > when a new maintenance schedule is posted for a subset of slaves > -- > > Key: MESOS-7882 > URL: https://issues.apache.org/jira/browse/MESOS-7882 > Project: Mesos > Issue Type: Bug > Components: master >Affects Versions: 1.3.0 > Environment: Ubuntu 14:04(trusty) > Mesos master branch. > SHA: a31dd52ab71d2a529b55cd9111ec54acf7550ded >Reporter: Sagar Sadashiv Patwardhan >
[jira] [Created] (MESOS-7882) Mesos master rescinds all the in-flight offers from all the registered agents when a new maintenance schedule is posted for a subset of slaves
Sagar Sadashiv Patwardhan created MESOS-7882: Summary: Mesos master rescinds all the in-flight offers from all the registered agents when a new maintenance schedule is posted for a subset of slaves Key: MESOS-7882 URL: https://issues.apache.org/jira/browse/MESOS-7882 Project: Mesos Issue Type: Bug Components: master Affects Versions: 1.3.0 Environment: Ubuntu 14:04(trusty) Mesos master branch. SHA: a31dd52ab71d2a529b55cd9111ec54acf7550ded Reporter: Sagar Sadashiv Patwardhan Priority: Minor We are running mesos 1.1.0 in production. We use a custom autoscaler for scaling our mesos cluster up and down. While scaling down the cluster, autoscaler makes a POST request to mesos master /maintenance/schedule endpoint with a set of slaves to move to maintenance mode. This forces mesos master to rescind all the in-flight offers from *all the slaves* in the cluster. If our scheduler accepts one of these offers, then we get a TASK_LOST status update back for that task. We also see such (https://gist.github.com/sagar8192/8858e7cb59a23e8e1762a27571824118) log lines in mesos master logs. After reading the code(refs: https://github.com/apache/mesos/blob/master/src/master/master.cpp#L6772), it appears that offers are getting rescinded for all the slaves. I am not sure what is the expected behavior here, but it makes more sense if only resources from slaves marked for maintenance are reclaimed. Experiment: To verify that it is actually happening, I checked out the master branch(sha: a31dd52ab71d2a529b55cd9111ec54acf7550ded ) and added some log lines(https://gist.github.com/sagar8192/42ca055720549c5ff3067b1e6c7c68b3). Built the binary and started a mesos master and 2 agent processes. Used a basic python framework that launches docker containers on these slaves. Verified that there is no existing schedule for any slaves using `curl 10.40.19.239:5050/maintenance/status`. Posted maintenance schedule for one of the slaves(https://gist.github.com/sagar8192/fb65170240dd32a53f27e6985c549df0) after starting the mesos framework. Logs: mesos-master: https://gist.github.com/sagar8192/91888419fdf8284e33ebd58351131203 mesos-slave1: https://gist.github.com/sagar8192/3a83364b1f5ffc63902a80c728647f31 mesos-slave2: https://gist.github.com/sagar8192/1b341ef2271dde11d276974a27109426 Mesos framework: https://gist.github.com/sagar8192/bcd4b37dba03bde0a942b5b972004e8a I think mesos should rescind offers and inverse offers only for those slaves that are marked for maintenance(draining mode). -- This message was sent by Atlassian JIRA (v6.4.14#64029)