it looks like Pod. how about upgrade TASK to POD concept? 2017-06-16 23:57 GMT+08:00 Zhitao Li <zhitaoli...@gmail.com>:
> Hi Ben, > > Thanks for reading the proposal. There are several motivations, although > scalability is the primary one: > > 1) w.r.t. scalability, it's not only Mesos's own scalability, but also > many* > additional infra tools* which need to integrate with Mesos and process > *every* task in the cluster: a 2-3x increase on task numbers would easily > make these systems harder to catch up with cluster size; > 2) Another thing we are looking at is to provide more robust and powerful > upgrade story for a pod of containers. Although such work does not demand > modeling multiple containers to one task, our internal discussions feel > that this modeling makes it easier to handle. A couple of things we are > specifically looking at: > > - reliable in-place upgrade: while dynamic reservation usually works, > it's still non-trivial to provide exact guarantee that allocator/master > will send back offers after a `KILL` in time. This is technically more > related to MESOS-1280 <https://issues.apache.org/jira/browse/MESOS-1280 > >. > - automatic rollback upon failed upgrade: similar to above point, it'll > be great if the entire scheduler/mesos stack can guarantee an atomic > rollback. Right now this depends on availability of entire control plane > (scheduler and master) since multiple messages need to be passed. > - zero-traffic-loss upgrade: if workload utilizes primitives like > SO_REUSE_PORT <https://lwn.net/Articles/542629/>, it should be possible > to upgrade a container w/o losing any customer traffic. > > 3) another awkwardness of TaskGroup is that we do not really know how to > proper size a task within a group because they are isolated by the same > root container's scope, neither do we really care from a scheduler's > perspective. Sizing the sum of the containers are far more important than > sizing each task to us. > 4) Also, it seems like we cannot add a new "zero resource usage" task to a > group right now, therefore adding/removing a container has to involved both > the "scheduling" logic, and the "container upgrade" part. > > The last two points came from internal discussion with our scheduler team. > I guess they may not be as significant as first two, but I'm just putting > them on the table. > > > On Thu, Jun 15, 2017 at 2:43 PM, Benjamin Mahler <bmah...@apache.org> > wrote: > > > From reading this, the motivation is that TaskGroup having 1 task per > > container "could create a scalability issue for a large scale Mesos > cluster > > since many endpoints/operations scale with the total number of Tasks in > the > > cluster." > > > > Is that the only motivation here? > > > > On Thu, Jun 15, 2017 at 11:45 AM, Charles Raimbert < > craimber...@gmail.com> > > wrote: > > > >> Hello All, > >> > >> As we are interested in PODs to run colocated containers under the same > >> grouping, we have been looking at TaskGroup but we have also been > working > >> on a design to allow multiple containers in the same Task. > >> > >> Please feel free to write your comments and suggestions on the proposal > >> draft: > >> https://docs.google.com/document/d/1Os5tXUJfJ8Op_YBZR7L8hSHq > >> IeO1f9LY2yzKxsOdrwg > >> > >> Thanks, > >> Charles Raimbert & Zhitao Li > >> > > > > > > > -- > Cheers, > > Zhitao Li > -- Deshi Xiao Twitter: xds2000 E-mail: xiaods(AT)gmail.com