Re: [VOTE] Release Apache Mesos 0.26.0 (rc4)

2015-12-10 Thread Benjamin Mahler
To be clear, I'm a binding -1 without this fix pulled in unless folks can
convince me otherwise. :)

Are we sure that folks are even using the consecutive failure health check
feature successfully? When a colleague showed me how he ran into this bug,
it was happening with a pretty high frequency! This led to us spending
hours figuring out what was wrong. That's a really bad experience to give
users, and might be the kind of thing that bites us and warrants a 0.26.1.

Happy to vote promptly on the next RC! Thanks for all the hard work so far
with the release, it really is appreciated.

On Thu, Dec 10, 2015 at 11:22 AM, Benjamin Mahler  wrote:

> What is the workaround?
>
> On Thu, Dec 10, 2015 at 4:37 AM, Bernd Mathiske 
> wrote:
>
>> I think that whereas this would clearly be a desirable bug fix to have,
>> it is not a blocker:
>> - Not a regression. This problem has been around for a long time, since
>> 0.20 AFAIK.
>> - There is a simple workaround.
>>
>> Bernd
>>
>> On Dec 10, 2015, at 3:05 AM, Benjamin Mahler 
>> wrote:
>>
>> I'd really like to pull in the fix for:
>> https://issues.apache.org/jira/browse/MESOS-4106
>>
>> This has been a long standing bug that makes the health checking not
>> function correctly some of the time. While it is rare in CI, it appeared in
>> a colleague's cluster for about a third of the tasks he was launching to
>> demonstrate how he ran into this. The fix is trivial and is in review.
>>
>> Thoughts?
>>
>> On Tue, Dec 8, 2015 at 7:01 AM, Bernd Mathiske 
>> wrote:
>>
>>> +1 (binding)
>>>
>>> Ran make check, make distcheck, sudo bin/mesos-tests.sh, with SSL
>>> enabled and without on: Ubuntu 12.04, CentOS 7.1.
>>>
>>> Had 4 test failures with CentOS 7.1 for each configuration variant. All
>>> of the failed tests are known to be flaky, they have MESOS tickets, and
>>> they have been investigated and are deemed non-blockers.
>>>
>>> Bernd
>>>
>>> > On Dec 8, 2015, at 4:59 AM, Till Toenshoff  wrote:
>>> >
>>> > Hi friends,
>>> >
>>> > we had noticed some discrepancies between the V0 API and the V1 API,
>>> > hence we had to create a new release candidate even after the voting of
>>> > 0.26.0-rc3 had officially ended. Sorry for that!
>>> >
>>> > Please vote on releasing the following candidate as Apache Mesos
>>> 0.26.0.
>>> >
>>> > The CHANGELOG for the release is available at:
>>> >
>>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.26.0-rc4
>>> >
>>> 
>>> >
>>> > The candidate for Mesos 0.26.0 release is available at:
>>> >
>>> https://dist.apache.org/repos/dist/dev/mesos/0.26.0-rc4/mesos-0.26.0.tar.gz
>>> >
>>> > The tag to be voted on is 0.26.0-rc4:
>>> >
>>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.26.0-rc4
>>> >
>>> > The MD5 checksum of the tarball can be found at:
>>> >
>>> https://dist.apache.org/repos/dist/dev/mesos/0.26.0-rc4/mesos-0.26.0.tar.gz.md5
>>> >
>>> > The signature of the tarball can be found at:
>>> >
>>> https://dist.apache.org/repos/dist/dev/mesos/0.26.0-rc4/mesos-0.26.0.tar.gz.asc
>>> >
>>> > The PGP key used to sign the release is here:
>>> > https://dist.apache.org/repos/dist/release/mesos/KEYS
>>> >
>>> > The JAR is up in Maven in a staging repository here:
>>> > https://repository.apache.org/content/repositories/orgapachemesos-1093
>>> >
>>> > Please vote on releasing this package as Apache Mesos 0.26.0!
>>> >
>>> > The vote is open until Fri Dec 11 04:50:51 CET 2015 and passes if a
>>> majority of at least 3 +1 PMC votes are cast.
>>> >
>>> > [ ] +1 Release this package as Apache Mesos 0.26.0
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> > Thanks,
>>> > Bernd & Till
>>> >
>>>
>>>
>>
>>
>


[VOTE] Release Apache Mesos 0.26.0 (rc5)

2015-12-10 Thread Till Toenshoff
Hi friends,

we did unfortunately, once again run into an issue that needed immediate 
attention (see vote on rc4), hence we have to ask for another round of testing 
and voting of this newest release-candidate.

The issue leading to this new release candidate was 
https://issues.apache.org/jira/browse/MESOS-4106 
. Apart from that, we also 
pulled in a fix for https://issues.apache.org/jira/browse/MESOS-4015 
 as we believe it has minimal 
additional risk while being very useful for some of us.

Please vote on releasing the following candidate as Apache Mesos 0.26.0.

The CHANGELOG for the release is available at:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.26.0-rc5


The candidate for Mesos 0.26.0 release is available at:
https://dist.apache.org/repos/dist/dev/mesos/0.26.0-rc5/mesos-0.26.0.tar.gz

The tag to be voted on is 0.26.0-rc5:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.26.0-rc5

The MD5 checksum of the tarball can be found at:
https://dist.apache.org/repos/dist/dev/mesos/0.26.0-rc5/mesos-0.26.0.tar.gz.md5

The signature of the tarball can be found at:
https://dist.apache.org/repos/dist/dev/mesos/0.26.0-rc5/mesos-0.26.0.tar.gz.asc

The PGP key used to sign the release is here:
https://dist.apache.org/repos/dist/release/mesos/KEYS

The JAR is up in Maven in a staging repository here:
https://repository.apache.org/content/repositories/orgapachemesos-1095

Please vote on releasing this package as Apache Mesos 0.26.0!

The vote is open until Tue Dec 15 22:35:22 CET 2015 and passes if a majority of 
at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Mesos 0.26.0
[ ] -1 Do not release this package because ...

Thanks,
Till & Bernd

Allocator API changes

2015-12-10 Thread Neil Conway
Hi everyone,

The allocator API [1] is going to change in the forthcoming 0.26
release [2]. Custom allocators will need to implement several new API
methods. Further changes to the allocator API are being contemplated
for the 0.27 release [3].

If you have built a custom allocator, please speak up! We'd like to
understand what you're using the API for, and what we can do to
minimize any disruptions that might be caused by future API changes.

Thanks!

Neil

[1] 
https://github.com/apache/mesos/blob/master/include/mesos/master/allocator.hpp
[2] See these commits and related work on quota support:

https://github.com/apache/mesos/commit/6b66fb712224243100065b295efb18a1cb1f7181
https://github.com/apache/mesos/commit/1538a4df3752ce177b6bc16519e1f57893ebdc09

[3] See https://issues.apache.org/jira/browse/MESOS-4085 and related JIRAs.


Re: [VOTE] Release Apache Mesos 0.26.0 (rc4)

2015-12-10 Thread Bernd Mathiske
I think that whereas this would clearly be a desirable bug fix to have, it is 
not a blocker:
- Not a regression. This problem has been around for a long time, since 0.20 
AFAIK.
- There is a simple workaround.

Bernd

> On Dec 10, 2015, at 3:05 AM, Benjamin Mahler  
> wrote:
> 
> I'd really like to pull in the fix for:
> https://issues.apache.org/jira/browse/MESOS-4106 
> 
> 
> This has been a long standing bug that makes the health checking not function 
> correctly some of the time. While it is rare in CI, it appeared in a 
> colleague's cluster for about a third of the tasks he was launching to 
> demonstrate how he ran into this. The fix is trivial and is in review.
> 
> Thoughts?
> 
> On Tue, Dec 8, 2015 at 7:01 AM, Bernd Mathiske  > wrote:
> +1 (binding)
> 
> Ran make check, make distcheck, sudo bin/mesos-tests.sh, with SSL enabled and 
> without on: Ubuntu 12.04, CentOS 7.1.
> 
> Had 4 test failures with CentOS 7.1 for each configuration variant. All of 
> the failed tests are known to be flaky, they have MESOS tickets, and they 
> have been investigated and are deemed non-blockers.
> 
> Bernd
> 
> > On Dec 8, 2015, at 4:59 AM, Till Toenshoff  > > wrote:
> >
> > Hi friends,
> >
> > we had noticed some discrepancies between the V0 API and the V1 API,
> > hence we had to create a new release candidate even after the voting of
> > 0.26.0-rc3 had officially ended. Sorry for that!
> >
> > Please vote on releasing the following candidate as Apache Mesos 0.26.0.
> >
> > The CHANGELOG for the release is available at:
> > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.26.0-rc4
> >  
> > 
> > 
> >
> > The candidate for Mesos 0.26.0 release is available at:
> > https://dist.apache.org/repos/dist/dev/mesos/0.26.0-rc4/mesos-0.26.0.tar.gz 
> > 
> >
> > The tag to be voted on is 0.26.0-rc4:
> > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.26.0-rc4 
> > 
> >
> > The MD5 checksum of the tarball can be found at:
> > https://dist.apache.org/repos/dist/dev/mesos/0.26.0-rc4/mesos-0.26.0.tar.gz.md5
> >  
> > 
> >
> > The signature of the tarball can be found at:
> > https://dist.apache.org/repos/dist/dev/mesos/0.26.0-rc4/mesos-0.26.0.tar.gz.asc
> >  
> > 
> >
> > The PGP key used to sign the release is here:
> > https://dist.apache.org/repos/dist/release/mesos/KEYS 
> > 
> >
> > The JAR is up in Maven in a staging repository here:
> > https://repository.apache.org/content/repositories/orgapachemesos-1093 
> > 
> >
> > Please vote on releasing this package as Apache Mesos 0.26.0!
> >
> > The vote is open until Fri Dec 11 04:50:51 CET 2015 and passes if a 
> > majority of at least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Mesos 0.26.0
> > [ ] -1 Do not release this package because ...
> >
> > Thanks,
> > Bernd & Till
> >
> 
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Please review design doc for task resizing

2015-12-10 Thread Qian Zhang
Since we all agree option 2 is the best option for scheduler API's change,
I have updated design doc by marking it as the first option which means it
is the final decision.

I see. However, that operation is not idempotent. Imagine you issue a
> resize request and for some reason, the request takes long to carry out and
> you don't have a way to guarantee that the request was received (for
> example, during a master failover). In the mean time, you issue another
> resize. When both land, it may not be the action you wanted.
> containerizer->update() applies the aggregate size anyway, so you need to
> keep track of the 'sign' of the resize all the way down to the slave
> process.
>

Yes, I understand the operation in current design is not idempotent. But I
think when a master failover, framework will do reconciliation with master
so that it can know the latest resources used by its task, and then it can
decide to issue another resize operation or not.

> And I have 2 more questions that I want to discuss with you:
> > 1. David G raised a user story about framework should be able to resize
> its
> > executor, I think this should be a valid use case, but I would suggest us
> > to focus on task resizing in MVP and handle executor resizing in the
> > post-MVP, how do you think?
> > 2. Do you think we need to involve executor in task resizing? E.g., let
> > slave send a message (e.g., RunTaskMessage) to executor so that executor
> > can do the actual resizing? The reason I raise this question is that I
> > think in some cases, executor needs to be aware of the resized resources,
> > e.g., framework adds a new port to a task, I think executor & task should
> > know such new port so that the task can start to use it. And in the
> > Kubernetes on Mesos case, user may want to resize a pod which is actually
> > created an managed by k8sm-executor, so it should be involved to resize
> the
> > resources of the pod.
> >
>
> Maybe we can do that down the line; as an MVP, maybe we can skip it but
> have a model that supports it?
> Using the task info as a 'desired state', changing the executor info
> resources could be used to change it's size. However, there are some
> details in terms of master failover and slave reregistration where executor
> infos are sent from the slaves, where we need to be careful.
>

So you mean for executor resize, we do not need to implement it in MVP, but
need to cover it in the design doc so that we will know how we are going to
implement it in post-MVP, right? I am not sure what you mean about "Using
the task info as a 'desired state'", I think we will not leverage or change
TaskInfo in this project, so can you please elaborate it?

And any comments for my second question above? Do you think we need to
involve executor in task resizing?

> Currently I do not have PoC implementation for my proposal yet, do you
> > recommend that we should have it now? Or after the design is close to be
> > finalized or at least after we make the decision among those 3 options
> > about scheduler API changes in the design doc?
> >
>
> Doesn't hurt to experiment and see if there are obvious things that we
> missed to address.
> If you haven't done any work yet, I'd maybe defer until we at least have
> the placement of the 'resize operation' nailed down.
>

OK, so you prefer we start to do PoC implementation after we finalize the
design of resize operation in scheduler API, right? I think it should be
clear now since we all agree option 2 is the best.


> > I'd like to have an online sync up with you, can you please let me know
> > when you will be online in IRC usually? Or you prefer other ways to sync
> > up? I will try to catch you :-)
> >
>
> Let's do a joint call; how about Friday or Monday?
> I am available in business hours PST.


Sure, what about 4:00 pm this Friday PST? And you prefer IRC, Skype call?
Or other ways? :-)


Regards,
Qian


Re: [VOTE] Release Apache Mesos 0.26.0 (rc4)

2015-12-10 Thread Jan Schlicht
+1 (non-binding)

Tested on OS X, Fedora 23, and CentOS 7.1 with and without SSL.
`sudo ./bin/mesos-tests.sh` was fine, apart from known (non blocker) issues.

On Thu, Dec 10, 2015 at 1:37 PM, Bernd Mathiske  wrote:

> I think that whereas this would clearly be a desirable bug fix to have, it
> is not a blocker:
> - Not a regression. This problem has been around for a long time, since
> 0.20 AFAIK.
> - There is a simple workaround.
>
> Bernd
>
> On Dec 10, 2015, at 3:05 AM, Benjamin Mahler 
> wrote:
>
> I'd really like to pull in the fix for:
> https://issues.apache.org/jira/browse/MESOS-4106
>
> This has been a long standing bug that makes the health checking not
> function correctly some of the time. While it is rare in CI, it appeared in
> a colleague's cluster for about a third of the tasks he was launching to
> demonstrate how he ran into this. The fix is trivial and is in review.
>
> Thoughts?
>
> On Tue, Dec 8, 2015 at 7:01 AM, Bernd Mathiske 
> wrote:
>
>> +1 (binding)
>>
>> Ran make check, make distcheck, sudo bin/mesos-tests.sh, with SSL enabled
>> and without on: Ubuntu 12.04, CentOS 7.1.
>>
>> Had 4 test failures with CentOS 7.1 for each configuration variant. All
>> of the failed tests are known to be flaky, they have MESOS tickets, and
>> they have been investigated and are deemed non-blockers.
>>
>> Bernd
>>
>> > On Dec 8, 2015, at 4:59 AM, Till Toenshoff  wrote:
>> >
>> > Hi friends,
>> >
>> > we had noticed some discrepancies between the V0 API and the V1 API,
>> > hence we had to create a new release candidate even after the voting of
>> > 0.26.0-rc3 had officially ended. Sorry for that!
>> >
>> > Please vote on releasing the following candidate as Apache Mesos 0.26.0.
>> >
>> > The CHANGELOG for the release is available at:
>> >
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.26.0-rc4
>> >
>> 
>> >
>> > The candidate for Mesos 0.26.0 release is available at:
>> >
>> https://dist.apache.org/repos/dist/dev/mesos/0.26.0-rc4/mesos-0.26.0.tar.gz
>> >
>> > The tag to be voted on is 0.26.0-rc4:
>> >
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.26.0-rc4
>> >
>> > The MD5 checksum of the tarball can be found at:
>> >
>> https://dist.apache.org/repos/dist/dev/mesos/0.26.0-rc4/mesos-0.26.0.tar.gz.md5
>> >
>> > The signature of the tarball can be found at:
>> >
>> https://dist.apache.org/repos/dist/dev/mesos/0.26.0-rc4/mesos-0.26.0.tar.gz.asc
>> >
>> > The PGP key used to sign the release is here:
>> > https://dist.apache.org/repos/dist/release/mesos/KEYS
>> >
>> > The JAR is up in Maven in a staging repository here:
>> > https://repository.apache.org/content/repositories/orgapachemesos-1093
>> >
>> > Please vote on releasing this package as Apache Mesos 0.26.0!
>> >
>> > The vote is open until Fri Dec 11 04:50:51 CET 2015 and passes if a
>> majority of at least 3 +1 PMC votes are cast.
>> >
>> > [ ] +1 Release this package as Apache Mesos 0.26.0
>> > [ ] -1 Do not release this package because ...
>> >
>> > Thanks,
>> > Bernd & Till
>> >
>>
>>
>
>


-- 
*Jan Schlicht*
Distributed Systems Engineer, Mesosphere