Hi Ben, Thanks for reading the proposal. There are several motivations, although scalability is the primary one:
1) w.r.t. scalability, it's not only Mesos's own scalability, but also many* additional infra tools* which need to integrate with Mesos and process *every* task in the cluster: a 2-3x increase on task numbers would easily make these systems harder to catch up with cluster size; 2) Another thing we are looking at is to provide more robust and powerful upgrade story for a pod of containers. Although such work does not demand modeling multiple containers to one task, our internal discussions feel that this modeling makes it easier to handle. A couple of things we are specifically looking at: - reliable in-place upgrade: while dynamic reservation usually works, it's still non-trivial to provide exact guarantee that allocator/master will send back offers after a `KILL` in time. This is technically more related to MESOS-1280 <https://issues.apache.org/jira/browse/MESOS-1280>. - automatic rollback upon failed upgrade: similar to above point, it'll be great if the entire scheduler/mesos stack can guarantee an atomic rollback. Right now this depends on availability of entire control plane (scheduler and master) since multiple messages need to be passed. - zero-traffic-loss upgrade: if workload utilizes primitives like SO_REUSE_PORT <https://lwn.net/Articles/542629/>, it should be possible to upgrade a container w/o losing any customer traffic. 3) another awkwardness of TaskGroup is that we do not really know how to proper size a task within a group because they are isolated by the same root container's scope, neither do we really care from a scheduler's perspective. Sizing the sum of the containers are far more important than sizing each task to us. 4) Also, it seems like we cannot add a new "zero resource usage" task to a group right now, therefore adding/removing a container has to involved both the "scheduling" logic, and the "container upgrade" part. The last two points came from internal discussion with our scheduler team. I guess they may not be as significant as first two, but I'm just putting them on the table. On Thu, Jun 15, 2017 at 2:43 PM, Benjamin Mahler <bmah...@apache.org> wrote: > From reading this, the motivation is that TaskGroup having 1 task per > container "could create a scalability issue for a large scale Mesos cluster > since many endpoints/operations scale with the total number of Tasks in the > cluster." > > Is that the only motivation here? > > On Thu, Jun 15, 2017 at 11:45 AM, Charles Raimbert <craimber...@gmail.com> > wrote: > >> Hello All, >> >> As we are interested in PODs to run colocated containers under the same >> grouping, we have been looking at TaskGroup but we have also been working >> on a design to allow multiple containers in the same Task. >> >> Please feel free to write your comments and suggestions on the proposal >> draft: >> https://docs.google.com/document/d/1Os5tXUJfJ8Op_YBZR7L8hSHq >> IeO1f9LY2yzKxsOdrwg >> >> Thanks, >> Charles Raimbert & Zhitao Li >> > > -- Cheers, Zhitao Li