Re: RFC: RevocableInfo Changes
Here's some input :). If throttling is tolerable but preemption is not, how would that be expressed? (Is that supported?) [Klaus]: It's not supported; only revocable resources has this attribute: non-throttleable or throttleable. The throttleable revocable resources is reported by ResourceEstimator which means the resources maybe throttled by its original owner. How does this work with the QoS controller? Will there be a new correction type to indicate throttling, or does throttling happen "behind the agent's back"? [Klaus]: The QoSController/ResourceEstimator only manages throttleable revocable resources; the others resources (regular resources and non-throttleable revocable resources) are managed by allocator. The "manage" means generation and destroy/eviction. Regarding "throttling happen", good question. I think the throttling will dependent on containers, let me double check it :). If any comments, please let me know. Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer Platform OpenSource Technology, STG, IBM GCG +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me On Sat, Mar 19, 2016 at 11:15 PM,wrote: > Thanks for the good explanations so far Ben and Klaus. Apologies if you > guys already covered these questions in the meeting: > > If throttling is tolerable but preemption is not, how would that be > expressed? (Is that supported?) > > How does this work with the QoS controller? Will there be a new correction > type to indicate throttling, or does throttling happen "behind the agent's > back"? > > Thanks, > -- > Connor > > > On Mar 19, 2016, at 04:01, Klaus Ma wrote: > > > > @team, in the latest meeting, we agree to keep current name ThrottleInfo. > > > > If any more comments, please let me know. > > > >> On Wednesday, March 16, 2016 at 9:32:37 PM UTC+8, Guangya Liu wrote: > >> Also please show your comments if any for the name here, the current > name is ThrottleInfo, in Kubernetes resources qos design document, they are > using scavenging as the key work for such behaviour, so a possible name > here could be ScavengeInfo , please show your comments if any for those two > names or even if you want to propose a new name here. > >> > >> message RevocableInfo { > >> message ThrottleInfo {} > >> > >> // If set, indicates that the resources may be throttled at > >> // any time. Throttle-able resoruces can be used for tasks > >> // that do not have strict performance requirements and are > >> // capable of handling being throttled. > >> optional ThrottleInfo throttle_info = 1; > >> } > >> > >> 在 2016年3月16日星期三 UTC+8上午10:24:14,Klaus Ma写道: > >>> > >>> The patches are updated accordingly; JIRA: MESOS-3888 , RR: > https://reviews.apache.org/r/40375/ . > >>> > >>> Thanks > >>> klaus > >>> > On Saturday, March 12, 2016 at 11:09:46 AM UTC+8, Benjamin Mahler > wrote: > Hey folks, > > In the resource allocation working group we've been looking into a > few projects that will make the allocator able to offer out resources as > revocable. For example: > > -We'll want to eventually allocate resources as revocable _by > default_, only allowing non-revocable when there are guarantees put in > place (static reservations or quota). > > -On the path to revocable by default, we can incrementally start to > offer certain resources as revocable. Consider when quota is set but the > role isn't using all of the quota. The unallocated quota can be offered to > other roles, but it should be revocable because we may revoke them should > the quota'ed role want to use the resources. Unused reservations fall into > a similar category. > > -Going revocable by default also allows us to enforce fairness in a > dynamically changing cluster by revoking resources as weights are changed, > frameworks are added or removed, etc. > > In this context, "revocable" means that the resources may be taken > away and the container will be destroyed. The meaning of "revocable" in the > context of usage oversubscription includes this, but also the container may > experience a throttling (e.g. lower cpu shares, less network priority, etc). > > For this reason, and because we internally need to distinguish > revocable resources between the those that are generated by usage > oversubscription and those that are generated by the allocator, we're > thinking of the following change to the API: > > > > - message RevocableInfo {} > + message RevocableInfo { > +message ThrottleInfo {} > + > +// If set, indicates that the resources may be throttled at > +// any time. Throttle-able resoruces can be used for tasks > +// that do not have strict performance requirements and are > +// capable of handling being throttled. > +optional ThrottleInfo throttle_info; > + } > > // If this is set, the
Re: Mesos Cgroups Unified Isolator Design
Thanks a lot for your detailed revises! On Mon, Mar 21, 2016 at 6:59 AM, Erik Weatherswrote: > gave a bunch of minor grammar comments to make the wording flow. > > On Sun, Mar 20, 2016 at 11:21 AM, haosdent wrote: > >> Dear friends. I hope you have a good weekend. >> >> Jie Yu and I are working on add a new cgroups unified isolator to solve >> the >> problems we encountered in cgroups isolation. >> >> For more details about it, we recorded it in this epic: >> >> [Consolidate cgroup isolators into one single isolator]: >> https://issues.apache.org/jira/browse/MESOS-4697 >> >> For the design of it, we describe it in this design document: >> >> [Mesos Cgroups Unified Isolator Design]: >> >> https://docs.google.com/document/d/1rAAzymtY5tcXY9X-Ryz6tEFeWA1_VBhnFYoE7M2kbvk/edit?usp=sharing >> >> Any comments and suggestions are appreciated! >> >> -- >> Best Regards, >> Haosdent Huang >> > > -- Best Regards, Haosdent Huang
Mesos Cgroups Unified Isolator Design
Dear friends. I hope you have a good weekend. Jie Yu and I are working on add a new cgroups unified isolator to solve the problems we encountered in cgroups isolation. For more details about it, we recorded it in this epic: [Consolidate cgroup isolators into one single isolator]: https://issues.apache.org/jira/browse/MESOS-4697 For the design of it, we describe it in this design document: [Mesos Cgroups Unified Isolator Design]: https://docs.google.com/document/d/1rAAzymtY5tcXY9X-Ryz6tEFeWA1_VBhnFYoE7M2kbvk/edit?usp=sharing Any comments and suggestions are appreciated! -- Best Regards, Haosdent Huang
Re: Backport r/44230 to 0.27 branch
Zemeer, thanks for the input. I think we should discuss that in the next community sync (can you join?). Vinod did some analysis on how people feel about the release cadence, but I don't see that results being published. Should we discuss that again and come up with some concrete action items? - Jie On Wed, Mar 16, 2016 at 11:57 AM, Zameer Manjiwrote: > Cong brings up a good point here. Currently Mesos has a very aggressive > release cadence. This results in several questions as a cluster operator > and framework author. > >- What is the support from the community/committers for each release? >- Do cluster operators and framework authors need to move at the same >space at the community? >- Will bugfixes be automatically backported? > > The lack of clarity here can result in several issues because it is easy > for the Mesos PMC to cut releases quickly, but it isn't easy for people > with existing clusters to upgrade at that pace. An aggressive release > policy without clear support for older releases can leave several users in > a bad position where they might need to upgrade Mesos through one (or > more!) releases just to get a critical bugfix. > > > > On Wed, Mar 16, 2016 at 11:44 AM, Cong Wang > wrote: > > > On Tue, Mar 15, 2016 at 2:39 PM, Jie Yu wrote: > > > Mesos currently has no notion of long term stable releases (i.e., > LTS). I > > > think the consensus in the last community sync was to introduce LTS > after > > > 1.0. > > > > > > You don't need LTS as kernel, even talking about short term stable > releases > > like 0.27.2 (?), they look horrible too, I don't see any git tags or > > branches for > > these releases, just a tar ball?! Huh... > > > > > > > > > > 0.27.2 has already been released. Looks like we need 0.27.3 if we want > to > > > backport it. > > > > > > What determines which patches need to backport for Mesos community? > > It doesn't look like every bug fix is evaluated and considered after they > > are merged into master branch. > > > > > > > > I am OK with back porting it. Then the question is that whether we want > > to > > > backport it to other releases as well. > > > > > > > It should be backported to whichever releases it applies to and you > > support, > > I don't see Mesos community has such a procedure. > > > > -- > > Zameer Manji > > > > >
Re: [RESULT][VOTE] Release Apache Mesos 0.28.0 (rc2)
Here is a link to the rpm/deb packages: http://open.mesosphere.com/downloads/mesos/#apache-mesos-0.28.0 Best, Kapil On Thu, Mar 17, 2016 at 2:33 PM, Vinod Konewrote: > +1 > > @vinodkone > > On Mar 17, 2016, at 11:27 AM, Bill Farner wrote: > > Jake - i think that would be wonderful! > > On Thu, Mar 17, 2016 at 11:17 AM, Jake Farrell > wrote: > >> I've been maintaining a deb/rpm set for Mesos and for Aurora and Thrift we >> have been using the infra supported Bintray to make it available to the >> community via http://www.apache.org/dist/${project}/${os} >> >> If there is interest I'd be happy to put some time into bringing my >> patches >> into reviews and helping setup jenkins tests, etc. >> >> -Jake >> >> >> >> >> >> >> On Thu, Mar 17, 2016 at 1:41 PM, Vinod Kone wrote: >> >> > The project itself doesn't officially release rpms/debs, but the >> community >> > members do. For example, Mesosphere is planning to release rpms/debs >> > shortly. >> > >> > On Thu, Mar 17, 2016 at 10:38 AM, craig w wrote: >> > >> > > Great news. Do the rpm's get automatically built and released or will >> > they >> > > come later this week? >> > > >> > > On Thu, Mar 17, 2016 at 1:28 PM, Vinod Kone >> > wrote: >> > > >> > >> Hi all, >> > >> >> > >> >> > >> The vote for Mesos 0.28.0 (rc2) has passed with the >> > >> >> > >> following votes. >> > >> >> > >> >> > >> +1 (Binding) >> > >> >> > >> -- >> > >> >> > >> Vinod Kone >> > >> >> > >> Michael Park >> > >> >> > >> Kapil Arya >> > >> >> > >> >> > >> +1 (Non-binding) >> > >> >> > >> -- >> > >> >> > >> Greg Mann >> > >> >> > >> Daniel Osborne >> > >> >> > >> Jorg Schad >> > >> >> > >> Zhitao Li >> > >> >> > >> >> > >> There were no 0 or -1 votes. >> > >> >> > >> >> > >> Please find the release at: >> > >> >> > >> https://dist.apache.org/repos/dist/release/mesos/0.28.0 >> > >> >> > >> >> > >> It is recommended to use a mirror to download the release: >> > >> >> > >> http://www.apache.org/dyn/closer.cgi >> > >> >> > >> >> > >> The CHANGELOG for the release is available at: >> > >> >> > >> >> > >> >> > >> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.28.0 >> > >> >> > >> >> > >> The mesos-0.28.0.jar has been released to: >> > >> >> > >> https://repository.apache.org >> > >> >> > >> >> > >> The website (http://mesos.apache.org) will be updated shortly to >> > reflect >> > >> this release. >> > >> >> > >> >> > >> Thanks, >> > >> >> > > >> > > >> > > >> > > -- >> > > >> > > https://github.com/mindscratch >> > > https://www.google.com/+CraigWickesser >> > > https://twitter.com/mind_scratch >> > > https://twitter.com/craig_links >> > > >> > > >> > >> > >
Re: Backport r/44230 to 0.27 branch
> > like many other review requests are burned or take 6+ months to merge Have you reached out to any shepherd for that ticket/review? - Jie On Wed, Mar 16, 2016 at 12:11 PM, Cong Wangwrote: > On Wed, Mar 16, 2016 at 11:58 AM, Jie Yu wrote: > > > > Currently, it's based on request. We definitely need to improve this > part. > > > It simply doesn't work, like many other review requests are burned or take > 6+ months to merge. I am sure you need to improve that too, but after > watching Mesos community for months, I don't see any improvement yet. >
Re: Backport r/44230 to 0.27 branch
> > Why not check your backlog for your answer? Or do you need me to write > a script to scan all the pending review requests for you? OK, i just looked at your pending patches: https://reviews.apache.org/users/wangcong/?show-closed=0 The associated tickets: https://issues.apache.org/jira/browse/MESOS-4740 https://issues.apache.org/jira/browse/MESOS-2769 https://issues.apache.org/jira/browse/MESOS-2799 (Some of the rb request does not have associated tickets) I don't see a shepherd for MESOS-4740. Looks like Vinod is the shepherd for MESOS-2769. MESOS-2799 does not have shepherd as well, but I think that should be me. Are you still interested in shipping those patches? I think you made a valid point that there is some problem regarding: 1) Do we want to work on all created tickets (i.e., how do we decide if we want to accept a ticket or not), and who decide that? 2) Once we accept the ticket, how can we prioritize those tickets? Should PMC members groom the accepted tickets regularly? 3) If no committer is volunteer for the accepted ticket, what's the procedure in that case, should we pick one? 4) What's the procedure of finding another shepherd if the original shepherd does not have time for that anymore. - Jie On Wed, Mar 16, 2016 at 2:32 PM, Cong Wangwrote: > On Wed, Mar 16, 2016 at 2:21 PM, Jie Yu wrote: > > I understand your frustration. I am curious what review/ticket are you > > talking about, and who is the shepherd for your review/ticket? > > > Why not check your backlog for your answer? Or do you need me to write > a script to scan all the pending review requests for you? > > > > > > Mesos project has a clear guide how to contribute to the project, that's > > what the community has agreed on: > > > > > https://github.com/apache/mesos/blob/master/docs/submitting-a-patch.md#before-you-start-writing-code > > > > I assume this doesn't apply to your committers, at least BenM: > > commit 152ac2b13916bcf2bb9e52accc4951c3ce5bfd76 > Author: Benjamin Mahler > Date: Sun Feb 21 14:22:07 2016 +0100 > > Log the shutdown duration in the executor driver. > > commit 1488f16d283f69b7dc96feaee91b04a09012ca4a > Author: Benjamin Mahler > Date: Sat Feb 20 17:35:30 2016 +0100 > > > Added TASK_KILLING to the API changes in the CHANGELOG. > > > commit 978ccb5dd637f0e1577ecae1e21973f50429b04c > Author: Benjamin Mahler > Date: Sat Feb 20 17:28:58 2016 +0100 > > > Added docker executor tests for TASK_KILLING. > > > commit ee86b13633a9469629dbd79681d0776b6020f76a > Author: Benjamin Mahler > Date: Sat Feb 20 16:18:22 2016 +0100 > > > Added command executor tests for TASK_KILLING. > > > commit 25d303d8743b524c92627d48f7dfb7ac2a921ede > Author: Benjamin Mahler > Date: Sat Feb 20 15:31:28 2016 +0100 > > > Fixed health check process leak when shutdown is called without > killTask. > > > > > "Find a shepherd to collaborate on your patch. A shepherd is a Mesos > > committer that will work with you to give you feedback on your proposed > > design, and to eventually commit your change into the Mesos source tree." > > > > This doesn't work, and it needs to change. I already state my reason in the > previous reply, which is just ignored, yeah, like many other requests. >
Re: Backport r/44230 to 0.27 branch
Maybe we can try to draft a formal guideline about when/how something should be back ported, and making sure interested parties in the community have chance to get their voices heard? I'm also interested in knowing how much work it generates when they cut with back port releases, and how the community could help. On Wed, Mar 16, 2016 at 12:11 PM, Cong Wangwrote: > On Wed, Mar 16, 2016 at 11:58 AM, Jie Yu wrote: > > > > Currently, it's based on request. We definitely need to improve this > part. > > > It simply doesn't work, like many other review requests are burned or take > 6+ months to merge. I am sure you need to improve that too, but after > watching Mesos community for months, I don't see any improvement yet. > -- Cheers, Zhitao Li