Re: RFC: RevocableInfo Changes

2016-03-20 Thread Klaus Ma
Here's some input :).

If throttling is tolerable but preemption is not, how would that be
expressed? (Is that supported?)
[Klaus]: It's not supported; only revocable resources has this attribute:
non-throttleable or throttleable. The throttleable revocable resources is
reported by ResourceEstimator which means the resources maybe throttled by
its original owner.

How does this work with the QoS controller? Will there be a new correction
type to indicate throttling, or does throttling happen "behind the agent's
back"?
[Klaus]: The QoSController/ResourceEstimator only manages throttleable
revocable resources; the others resources (regular resources and
non-throttleable revocable resources) are managed by allocator. The
"manage" means generation and destroy/eviction. Regarding "throttling
happen", good question. I think the throttling will dependent on
containers, let me double check it :).

If any comments, please let me know.


Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer
Platform OpenSource Technology, STG, IBM GCG
+86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me

On Sat, Mar 19, 2016 at 11:15 PM,  wrote:

> Thanks for the good explanations so far Ben and Klaus.  Apologies if you
> guys already covered these questions in the meeting:
>
> If throttling is tolerable but preemption is not, how would that be
> expressed? (Is that supported?)
>
> How does this work with the QoS controller? Will there be a new correction
> type to indicate throttling, or does throttling happen "behind the agent's
> back"?
>
> Thanks,
> --
> Connor
>
> > On Mar 19, 2016, at 04:01, Klaus Ma  wrote:
> >
> > @team, in the latest meeting, we agree to keep current name ThrottleInfo.
> >
> > If any more comments, please let me know.
> >
> >> On Wednesday, March 16, 2016 at 9:32:37 PM UTC+8, Guangya Liu wrote:
> >> Also please show your comments if any for the name here, the current
> name is ThrottleInfo, in Kubernetes resources qos design document, they are
> using scavenging as the key work for such behaviour, so a possible name
> here could be ScavengeInfo , please show your comments if any for those two
> names or even if you want to propose a new name here.
> >>
> >> message RevocableInfo {
> >> message ThrottleInfo {}
> >>
> >> // If set, indicates that the resources may be throttled at
> >> // any time. Throttle-able resoruces can be used for tasks
> >> // that do not have strict performance requirements and are
> >> // capable of handling being throttled.
> >> optional ThrottleInfo throttle_info = 1;
> >>   }
> >>
> >> 在 2016年3月16日星期三 UTC+8上午10:24:14,Klaus Ma写道:
> >>>
> >>> The patches are updated accordingly; JIRA: MESOS-3888 , RR:
> https://reviews.apache.org/r/40375/ .
> >>>
> >>> Thanks
> >>> klaus
> >>>
>  On Saturday, March 12, 2016 at 11:09:46 AM UTC+8, Benjamin Mahler
> wrote:
>  Hey folks,
> 
>  In the resource allocation working group we've been looking into a
> few projects that will make the allocator able to offer out resources as
> revocable. For example:
> 
>  -We'll want to eventually allocate resources as revocable _by
> default_, only allowing non-revocable when there are guarantees put in
> place (static reservations or quota).
> 
>  -On the path to revocable by default, we can incrementally start to
> offer certain resources as revocable. Consider when quota is set but the
> role isn't using all of the quota. The unallocated quota can be offered to
> other roles, but it should be revocable because we may revoke them should
> the quota'ed role want to use the resources. Unused reservations fall into
> a similar category.
> 
>  -Going revocable by default also allows us to enforce fairness in a
> dynamically changing cluster by revoking resources as weights are changed,
> frameworks are added or removed, etc.
> 
>  In this context, "revocable" means that the resources may be taken
> away and the container will be destroyed. The meaning of "revocable" in the
> context of usage oversubscription includes this, but also the container may
> experience a throttling (e.g. lower cpu shares, less network priority, etc).
> 
>  For this reason, and because we internally need to distinguish
> revocable resources between the those that are generated by usage
> oversubscription and those that are generated by the allocator, we're
> thinking of the following change to the API:
> 
> 
> 
>  -  message RevocableInfo {}
>  +  message RevocableInfo {
>  +message ThrottleInfo {}
>  +
>  +// If set, indicates that the resources may be throttled at
>  +// any time. Throttle-able resoruces can be used for tasks
>  +// that do not have strict performance requirements and are
>  +// capable of handling being throttled.
>  +optional ThrottleInfo throttle_info;
>  +  }
> 
> // If this is set, the 

Re: Mesos Cgroups Unified Isolator Design

2016-03-20 Thread haosdent
Thanks a lot for your detailed revises!

On Mon, Mar 21, 2016 at 6:59 AM, Erik Weathers 
wrote:

> gave a bunch of minor grammar comments to make the wording flow.
>
> On Sun, Mar 20, 2016 at 11:21 AM, haosdent  wrote:
>
>> Dear friends. I hope you have a good weekend.
>>
>> Jie Yu and I are working on add a new cgroups unified isolator to solve
>> the
>> problems we encountered in cgroups isolation.
>>
>> For more details about it, we recorded it in this epic:
>>
>> [Consolidate cgroup isolators into one single isolator]:
>> https://issues.apache.org/jira/browse/MESOS-4697
>>
>> For the design of it, we describe it in this design document:
>>
>> [Mesos Cgroups Unified Isolator Design]:
>>
>> https://docs.google.com/document/d/1rAAzymtY5tcXY9X-Ryz6tEFeWA1_VBhnFYoE7M2kbvk/edit?usp=sharing
>>
>> Any comments and suggestions are appreciated!
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>


-- 
Best Regards,
Haosdent Huang


Mesos Cgroups Unified Isolator Design

2016-03-20 Thread haosdent
Dear friends. I hope you have a good weekend.

Jie Yu and I are working on add a new cgroups unified isolator to solve the
problems we encountered in cgroups isolation.

For more details about it, we recorded it in this epic:

[Consolidate cgroup isolators into one single isolator]:
https://issues.apache.org/jira/browse/MESOS-4697

For the design of it, we describe it in this design document:

[Mesos Cgroups Unified Isolator Design]:
https://docs.google.com/document/d/1rAAzymtY5tcXY9X-Ryz6tEFeWA1_VBhnFYoE7M2kbvk/edit?usp=sharing

Any comments and suggestions are appreciated!

-- 
Best Regards,
Haosdent Huang


Re: Backport r/44230 to 0.27 branch

2016-03-20 Thread Jie Yu
Zemeer, thanks for the input. I think we should discuss that in the next
community sync (can you join?). Vinod did some analysis on how people feel
about the release cadence, but I don't see that results being published.
Should we discuss that again and come up with some concrete action items?

- Jie

On Wed, Mar 16, 2016 at 11:57 AM, Zameer Manji  wrote:

> Cong brings up a good point here. Currently Mesos has a very aggressive
> release cadence. This results in several questions as a cluster operator
> and framework author.
>
>- What is the support from the community/committers for each release?
>- Do cluster operators and framework authors need to move at the same
>space at the community?
>- Will bugfixes be automatically backported?
>
> The lack of clarity here can result in several issues because it is easy
> for the Mesos PMC to cut releases quickly, but it isn't easy for people
> with existing clusters to upgrade at that pace. An aggressive release
> policy without clear support for older releases can leave several users in
> a bad position where they might need to upgrade Mesos through one (or
> more!) releases just to get a critical bugfix.
>
>
>
> On Wed, Mar 16, 2016 at 11:44 AM, Cong Wang 
> wrote:
>
> > On Tue, Mar 15, 2016 at 2:39 PM, Jie Yu  wrote:
> > > Mesos currently has no notion of long term stable releases (i.e.,
> LTS). I
> > > think the consensus in the last community sync was to introduce LTS
> after
> > > 1.0.
> >
> >
> > You don't need LTS as kernel, even talking about short term stable
> releases
> > like 0.27.2 (?), they look horrible too, I don't see any git tags or
> > branches for
> > these releases, just a tar ball?! Huh...
> >
> >
> > >
> > > 0.27.2 has already been released. Looks like we need 0.27.3 if we want
> to
> > > backport it.
> >
> >
> > What determines which patches need to backport for Mesos community?
> > It doesn't look like every bug fix is evaluated and considered after they
> > are merged into master branch.
> >
> > >
> > > I am OK with back porting it. Then the question is that whether we want
> > to
> > > backport it to other releases as well.
> > >
> >
> > It should be backported to whichever releases it applies to and you
> > support,
> > I don't see Mesos community has such a procedure.
> >
> > --
> > Zameer Manji
> >
> >
>


Re: [RESULT][VOTE] Release Apache Mesos 0.28.0 (rc2)

2016-03-20 Thread Kapil Arya
Here is a link to the rpm/deb packages:

http://open.mesosphere.com/downloads/mesos/#apache-mesos-0.28.0

Best,
Kapil

On Thu, Mar 17, 2016 at 2:33 PM, Vinod Kone  wrote:

> +1
>
> @vinodkone
>
> On Mar 17, 2016, at 11:27 AM, Bill Farner  wrote:
>
> Jake - i think that would be wonderful!
>
> On Thu, Mar 17, 2016 at 11:17 AM, Jake Farrell 
> wrote:
>
>> I've been maintaining a deb/rpm set for Mesos and for Aurora and Thrift we
>> have been using the infra supported Bintray to make it available to the
>> community via http://www.apache.org/dist/${project}/${os}
>>
>> If there is interest I'd be happy to put some time into bringing my
>> patches
>> into reviews and helping setup jenkins tests, etc.
>>
>> -Jake
>>
>>
>>
>>
>>
>>
>> On Thu, Mar 17, 2016 at 1:41 PM, Vinod Kone  wrote:
>>
>> > The project itself doesn't officially release rpms/debs, but the
>> community
>> > members do.  For example, Mesosphere is planning to release rpms/debs
>> > shortly.
>> >
>> > On Thu, Mar 17, 2016 at 10:38 AM, craig w  wrote:
>> >
>> > > Great news. Do the rpm's get automatically built and released or will
>> > they
>> > > come later this week?
>> > >
>> > > On Thu, Mar 17, 2016 at 1:28 PM, Vinod Kone 
>> > wrote:
>> > >
>> > >> Hi all,
>> > >>
>> > >>
>> > >> The vote for Mesos 0.28.0 (rc2) has passed with the
>> > >>
>> > >> following votes.
>> > >>
>> > >>
>> > >> +1 (Binding)
>> > >>
>> > >> --
>> > >>
>> > >> Vinod Kone
>> > >>
>> > >> Michael Park
>> > >>
>> > >> Kapil Arya
>> > >>
>> > >>
>> > >> +1 (Non-binding)
>> > >>
>> > >> --
>> > >>
>> > >> Greg Mann
>> > >>
>> > >> Daniel Osborne
>> > >>
>> > >> Jorg Schad
>> > >>
>> > >> Zhitao Li
>> > >>
>> > >>
>> > >> There were no 0 or -1 votes.
>> > >>
>> > >>
>> > >> Please find the release at:
>> > >>
>> > >> https://dist.apache.org/repos/dist/release/mesos/0.28.0
>> > >>
>> > >>
>> > >> It is recommended to use a mirror to download the release:
>> > >>
>> > >> http://www.apache.org/dyn/closer.cgi
>> > >>
>> > >>
>> > >> The CHANGELOG for the release is available at:
>> > >>
>> > >>
>> > >>
>> >
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.28.0
>> > >>
>> > >>
>> > >> The mesos-0.28.0.jar has been released to:
>> > >>
>> > >> https://repository.apache.org
>> > >>
>> > >>
>> > >> The website (http://mesos.apache.org) will be updated shortly to
>> > reflect
>> > >> this release.
>> > >>
>> > >>
>> > >> Thanks,
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > >
>> > > https://github.com/mindscratch
>> > > https://www.google.com/+CraigWickesser
>> > > https://twitter.com/mind_scratch
>> > > https://twitter.com/craig_links
>> > >
>> > >
>> >
>>
>
>


Re: Backport r/44230 to 0.27 branch

2016-03-20 Thread Jie Yu
>
> like many other review requests are burned or take

6+ months to merge


Have you reached out to any shepherd for that ticket/review?

- Jie

On Wed, Mar 16, 2016 at 12:11 PM, Cong Wang  wrote:

> On Wed, Mar 16, 2016 at 11:58 AM, Jie Yu  wrote:
> >
> > Currently, it's based on request. We definitely need to improve this
> part.
>
>
> It simply doesn't work, like many other review requests are burned or take
> 6+ months to merge. I am sure you need to improve that too, but after
> watching Mesos community for months, I don't see any improvement yet.
>


Re: Backport r/44230 to 0.27 branch

2016-03-20 Thread Jie Yu
>
> Why not check your backlog for your answer? Or do you need me to write
> a script to scan all the pending review requests for you?


OK, i just looked at your pending patches:
https://reviews.apache.org/users/wangcong/?show-closed=0

The associated tickets:
https://issues.apache.org/jira/browse/MESOS-4740
https://issues.apache.org/jira/browse/MESOS-2769
https://issues.apache.org/jira/browse/MESOS-2799

(Some of the rb request does not have associated tickets)

I don't see a shepherd for MESOS-4740. Looks like Vinod is the shepherd for
MESOS-2769. MESOS-2799 does not have shepherd as well, but I think that
should be me. Are you still interested in shipping those patches?

I think you made a valid point that there is some problem regarding:
1) Do we want to work on all created tickets (i.e., how do we decide if we
want to accept a ticket or not), and who decide that?
2) Once we accept the ticket, how can we prioritize those tickets? Should
PMC members groom the accepted tickets regularly?
3) If no committer is volunteer for the accepted ticket, what's the
procedure in that case, should we pick one?
4) What's the procedure of finding another shepherd if the original
shepherd does not have time for that anymore.

- Jie


On Wed, Mar 16, 2016 at 2:32 PM, Cong Wang  wrote:

> On Wed, Mar 16, 2016 at 2:21 PM, Jie Yu  wrote:
> > I understand your frustration. I am curious what review/ticket are you
> > talking about, and who is the shepherd for your review/ticket?
>
>
> Why not check your backlog for your answer? Or do you need me to write
> a script to scan all the pending review requests for you?
>
>
> >
> > Mesos project has a clear guide how to contribute to the project, that's
> > what the community has agreed on:
> >
> >
> https://github.com/apache/mesos/blob/master/docs/submitting-a-patch.md#before-you-start-writing-code
> >
>
> I assume this doesn't apply to your committers, at least BenM:
>
> commit 152ac2b13916bcf2bb9e52accc4951c3ce5bfd76
> Author: Benjamin Mahler 
> Date:   Sun Feb 21 14:22:07 2016 +0100
>
> Log the shutdown duration in the executor driver.
>
> commit 1488f16d283f69b7dc96feaee91b04a09012ca4a
> Author: Benjamin Mahler 
> Date:   Sat Feb 20 17:35:30 2016 +0100
>
>
> Added TASK_KILLING to the API changes in the CHANGELOG.
>
>
> commit 978ccb5dd637f0e1577ecae1e21973f50429b04c
> Author: Benjamin Mahler 
> Date:   Sat Feb 20 17:28:58 2016 +0100
>
>
> Added docker executor tests for TASK_KILLING.
>
>
> commit ee86b13633a9469629dbd79681d0776b6020f76a
> Author: Benjamin Mahler 
> Date:   Sat Feb 20 16:18:22 2016 +0100
>
>
> Added command executor tests for TASK_KILLING.
>
>
> commit 25d303d8743b524c92627d48f7dfb7ac2a921ede
> Author: Benjamin Mahler 
> Date:   Sat Feb 20 15:31:28 2016 +0100
>
>
> Fixed health check process leak when shutdown is called without
> killTask.
>
>
>
> > "Find a shepherd to collaborate on your patch. A shepherd is a Mesos
> > committer that will work with you to give you feedback on your proposed
> > design, and to eventually commit your change into the Mesos source tree."
> >
>
> This doesn't work, and it needs to change. I already state my reason in the
> previous reply, which is just ignored, yeah, like many other requests.
>


Re: Backport r/44230 to 0.27 branch

2016-03-20 Thread Zhitao Li
Maybe we can try to draft a formal guideline about when/how something
should be back ported, and making sure interested parties in the community
have chance to get their voices heard?

I'm also interested in knowing how much work it generates when they cut
with back port releases, and how the community could help.

On Wed, Mar 16, 2016 at 12:11 PM, Cong Wang  wrote:

> On Wed, Mar 16, 2016 at 11:58 AM, Jie Yu  wrote:
> >
> > Currently, it's based on request. We definitely need to improve this
> part.
>
>
> It simply doesn't work, like many other review requests are burned or take
> 6+ months to merge. I am sure you need to improve that too, but after
> watching Mesos community for months, I don't see any improvement yet.
>



-- 
Cheers,

Zhitao Li