We shared this a month back through a tweet on supporting docker container pods with a single task (namespace collapse and resourcing sharing with parent mesos task) to satisfy certain needs where in we had to treat mesos, docker and docker-compose first class in our ecosystem. Slides: https://lnkd.in/gK8rNJ8
Video:https://lnkd.in/g5MAsk9 Source:https://github.com/paypal/dce-go Mesos still provides the most flexible primitives among other competing solutions to build solutions that you need. Agree with Jie, the native mesos generic pod integration via task groups and nested containers should probably have one recommended model and if that does not satisfy there are ways to achieve it. Thx From: Yan Xu <xuj...@apple.com> To: dev <dev@mesos.apache.org> Sent: Wednesday, June 21, 2017 11:29 AM Subject: Re: [Proposal] Multiple Containers in Single Mesos Task --- @xujyan <https://twitter.com/xujyan> On Fri, Jun 16, 2017 at 8:57 AM, Zhitao Li <zhitaoli...@gmail.com> wrote: > Hi Ben, > > Thanks for reading the proposal. There are several motivations, although > scalability is the primary one: > > 1) w.r.t. scalability, it's not only Mesos's own scalability, but also > many* > additional infra tools* which need to integrate with Mesos and process > *every* task in the cluster: a 2-3x increase on task numbers would easily > make these systems harder to catch up with cluster size; Have you looked into what the bottleneck is for these tools? In our experiences what hurts scalability most is not the number of tasks but the size of metadata that needs to be processed (per task). I am interested in seeing if this is still an issue if we improve the APIs by stripping out unnecessary fields, introducing API for querying individual tasks, etc. > 2) Another thing we are looking at is to provide more robust and powerful > upgrade story for a pod of containers. Although such work does not demand > modeling multiple containers to one task, our internal discussions feel > that this modeling makes it easier to handle. A couple of things we are > specifically looking at: > How do the following benefit from `additional_containers` instead of tasks? > > - reliable in-place upgrade: while dynamic reservation usually works, > it's still non-trivial to provide exact guarantee that allocator/master > will send back offers after a `KILL` in time. This is technically more > related to MESOS-1280 <https://issues.apache.org/jira/browse/MESOS-1280 > >. > This is interesting to us too, let's sync on this. > - automatic rollback upon failed upgrade: similar to above point, it'll > be great if the entire scheduler/mesos stack can guarantee an atomic > rollback. Right now this depends on availability of entire control plane > (scheduler and master) since multiple messages need to be passed. - zero-traffic-loss upgrade: if workload utilizes primitives like > SO_REUSE_PORT <https://lwn.net/Articles/542629/>, it should be possible > to upgrade a container w/o losing any customer traffic. > If you update the task in-place, you wouldn't even necessarily need to restart the process. I assume you are talking about cases where you have to, but it has to be supported by Mesos not moving your task to a random host? > > 3) another awkwardness of TaskGroup is that we do not really know how to > proper size a task within a group because they are isolated by the same > root container's scope, neither do we really care from a scheduler's > perspective. Sizing the sum of the containers are far more important than > sizing each task to us. > This is interesting. Right now resources in the tasks within the same group aren't isolated but they are bundled together anyway. In the long run when we start isolating them, perhaps we can make task resources optional if they are launched by a `LaunchGroup` operation? > 4) Also, it seems like we cannot add a new "zero resource usage" task to a > group right now, therefore adding/removing a container has to involved both > the "scheduling" logic, and the "container upgrade" part. > I guess you mean the scheduler shouldn't need to wait for new offers to simply update the current task so I think this is the same point as 1)? > > The last two points came from internal discussion with our scheduler team. > I guess they may not be as significant as first two, but I'm just putting > them on the table. > > > On Thu, Jun 15, 2017 at 2:43 PM, Benjamin Mahler <bmah...@apache.org> > wrote: > > > From reading this, the motivation is that TaskGroup having 1 task per > > container "could create a scalability issue for a large scale Mesos > cluster > > since many endpoints/operations scale with the total number of Tasks in > the > > cluster." > > > > Is that the only motivation here? > > > > On Thu, Jun 15, 2017 at 11:45 AM, Charles Raimbert < > craimber...@gmail.com> > > wrote: > > > >> Hello All, > >> > >> As we are interested in PODs to run colocated containers under the same > >> grouping, we have been looking at TaskGroup but we have also been > working > >> on a design to allow multiple containers in the same Task. > >> > >> Please feel free to write your comments and suggestions on the proposal > >> draft: > >> https://docs.google.com/document/d/1Os5tXUJfJ8Op_YBZR7L8hSHq > >> IeO1f9LY2yzKxsOdrwg > >> > >> Thanks, > >> Charles Raimbert & Zhitao Li > >> > > > > > > > -- > Cheers, > > Zhitao Li >