Re: Release policy and 1.6 release schedule

2018-03-23 Thread Vinod Kone
I’m +1 for quarterly. 

Most importantly I want us to adhere to a predictable cadence. 

Sent from my phone

> On Mar 23, 2018, at 9:21 PM, Jie Yu  wrote:
> 
> It's a burden for supporting multiple releases.
> 
> 1.2 was released March, 2017 (1 year ago), and I know that some users are 
> still on that version
> 1.3 was released June, 2017 (9 months ago), and we're still maintaining it 
> (still backport patches several days ago, which some users asked)
> 1.4 was released Sept, 2017 (6 months ago).
> 1.5 was released Feb, 2018 (1 month ago).
> 
> As you can see, users expect a release to be supported 6-9 months (e.g., 
> backports are still needed for 1.3 release, which is 9 months old). If we 
> were to do monthly minor release, we'll probably need to maintain 6-9 release 
> branches? That's too much of an ask for committers and maintainers.
> 
> I also agree with folks that there're benefits doing releases more 
> frequently. Given the historical data, I'd suggest we do quarterly releases, 
> and maintain three release branches.
> 
> - Jie
> 
>> On Fri, Mar 23, 2018 at 10:03 AM, Greg Mann  wrote:
>> The best motivation I can think of for a shorter release cycle is this: if
>> the release cadence is fast enough, then developers will be less likely to
>> rush a feature into a release. I think this would be a real benefit, since
>> rushing features in hurts stability. *However*, I'm not sure if every two
>> months is fast enough to bring this benefit. I would imagine that a
>> two-month wait is still long enough that people wouldn't want to wait an
>> entire release cycle to land their feature. Just off the top of my head, I
>> might guess that a release cadence of 1 month or shorter would be often
>> enough that it would always seem reasonable for a developer to wait until
>> the next release to land a feature. What do y'all think?
>> 
>> Other motivating factors that have been raised are:
>> 1) Many users upgrade on a longer timescale than every ~2 months. I think
>> that this doesn't need to affect our decision regarding release timing -
>> since we guarantee compatibility of all releases with the same major
>> version number, there is no reason that a user needs to upgrade minor
>> releases one at a time. It's fine to go from 1.N to 1.(N+3), for example.
>> 2) Backporting will be a burden if releases are too short. I think that in
>> practice, backporting will not take too much longer. If there was a
>> conflict back in the tree somewhere, then it's likely that after resolving
>> that conflict once, the same diff can be used to backport the change to
>> previous releases as well.
>> 3) Adhering strictly to a time-based release schedule will help users plan
>> their deployments, since they'll be able to rely on features being released
>> on-schedule. However, if we do strict time-based releases, then it will be
>> less certain that a particular feature will land in a particular release,
>> and users may have to wait a release cycle to get the feature.
>> 
>> Personally, I find the idea of preventing features from being rushed into a
>> release very compelling. From that perspective, I would love to see
>> releases every month. However, if we're not going to release that often,
>> then I think it does make sense to adjust our release schedule to
>> accommodate the features that community members want to land in a
>> particular release.
>> 
>> 
>> Jie, I'm curious why you suggest a *minimal* interval between releases.
>> Could you elaborate a bit on your motivations there?
>> 
>> Cheers,
>> Greg
>> 
>> 
>> On Fri, Mar 16, 2018 at 2:01 PM, Jie Yu  wrote:
>> 
>> > Thanks Greg for starting this thread!
>> >
>> >
>> >> My primary motivation here is to bring our documented policy in line
>> >> with our practice, whatever that may be
>> >
>> >
>> > +100
>> >
>> > Do people think that we should attempt to bring our release cadence more
>> >> in line with our current stated policy, or should the policy be changed
>> >> to reflect our current practice?
>> >
>> >
>> > I think a minor release every 2 months is probably too aggressive. I don't
>> > have concrete data, but my feeling is that the frequency that folks upgrade
>> > Mesos is low. I know that many users are still on 1.2.x.
>> >
>> > I'd actually suggest that we have a *minimal* interval between two
>> > releases (e.g., 3 months), and provide some buffer for the release process.
>> > (so we're expecting about 3 releases per year, this matches what we did
>> > last year).
>> >
>> > And we use our dev sync to coordinate on a release after the minimal
>> > release interval has elapsed (and elect a release manager).
>> >
>> > - Jie
>> >
>> > On Wed, Mar 14, 2018 at 9:51 AM, Zhitao Li  wrote:
>> >
>> >> An additional data point is how long it takes from first RC being cut to
>> >> the final release tag vote passes. That probably indicates smoothness of
>> >> the release process 

Re: Release policy and 1.6 release schedule

2018-03-23 Thread Jie Yu
>
> 2) Backporting will be a burden if releases are too short. I think that in
> practice, backporting will not take too much longer. If there was a
> conflict back in the tree somewhere, then it's likely that after resolving
> that conflict once, the same diff can be used to backport the change to
> previous releases as well.


I think the burden of maintaining a release branch is not just backporting.
We need to run CI to make sure every maintained release branch are working,
and do testing for that. It's a burden if there are too many release
branches.

- Jie

On Fri, Mar 23, 2018 at 9:21 PM, Jie Yu  wrote:

> It's a burden for supporting multiple releases.
>
> 1.2 was released March, 2017 (1 year ago), and I know that some users are
> still on that version
> 1.3 was released June, 2017 (9 months ago), and we're still maintaining it
> (still backport patches
> 
>  several
> days ago, which some users asked)
> 1.4 was released Sept, 2017 (6 months ago).
> 1.5 was released Feb, 2018 (1 month ago).
>
> As you can see, users expect a release to be supported 6-9 months (e.g.,
> backports are still needed for 1.3 release, which is 9 months old). If we
> were to do monthly minor release, we'll probably need to maintain 6-9
> release branches? That's too much of an ask for committers and maintainers.
>
> I also agree with folks that there're benefits doing releases more
> frequently. Given the historical data, I'd suggest we do quarterly
> releases, and maintain three release branches.
>
> - Jie
>
>
> On Fri, Mar 23, 2018 at 10:03 AM, Greg Mann  wrote:
>
>> The best motivation I can think of for a shorter release cycle is this: if
>> the release cadence is fast enough, then developers will be less likely to
>> rush a feature into a release. I think this would be a real benefit, since
>> rushing features in hurts stability. *However*, I'm not sure if every two
>> months is fast enough to bring this benefit. I would imagine that a
>> two-month wait is still long enough that people wouldn't want to wait an
>> entire release cycle to land their feature. Just off the top of my head, I
>> might guess that a release cadence of 1 month or shorter would be often
>> enough that it would always seem reasonable for a developer to wait until
>> the next release to land a feature. What do y'all think?
>>
>> Other motivating factors that have been raised are:
>> 1) Many users upgrade on a longer timescale than every ~2 months. I think
>> that this doesn't need to affect our decision regarding release timing -
>> since we guarantee compatibility of all releases with the same major
>> version number, there is no reason that a user needs to upgrade minor
>> releases one at a time. It's fine to go from 1.N to 1.(N+3), for example.
>> 2) Backporting will be a burden if releases are too short. I think that in
>> practice, backporting will not take too much longer. If there was a
>> conflict back in the tree somewhere, then it's likely that after resolving
>> that conflict once, the same diff can be used to backport the change to
>> previous releases as well.
>> 3) Adhering strictly to a time-based release schedule will help users plan
>> their deployments, since they'll be able to rely on features being
>> released
>> on-schedule. However, if we do strict time-based releases, then it will be
>> less certain that a particular feature will land in a particular release,
>> and users may have to wait a release cycle to get the feature.
>>
>> Personally, I find the idea of preventing features from being rushed into
>> a
>> release very compelling. From that perspective, I would love to see
>> releases every month. However, if we're not going to release that often,
>> then I think it does make sense to adjust our release schedule to
>> accommodate the features that community members want to land in a
>> particular release.
>>
>>
>> Jie, I'm curious why you suggest a *minimal* interval between releases.
>> Could you elaborate a bit on your motivations there?
>>
>> Cheers,
>> Greg
>>
>>
>> On Fri, Mar 16, 2018 at 2:01 PM, Jie Yu  wrote:
>>
>> > Thanks Greg for starting this thread!
>> >
>> >
>> >> My primary motivation here is to bring our documented policy in line
>> >> with our practice, whatever that may be
>> >
>> >
>> > +100
>> >
>> > Do people think that we should attempt to bring our release cadence more
>> >> in line with our current stated policy, or should the policy be changed
>> >> to reflect our current practice?
>> >
>> >
>> > I think a minor release every 2 months is probably too aggressive. I
>> don't
>> > have concrete data, but my feeling is that the frequency that folks
>> upgrade
>> > Mesos is low. I know that many users are still on 1.2.x.
>> >
>> > I'd actually suggest that we have a *minimal* interval between two
>> > releases (e.g., 3 months), and provide some buffer for the 

Re: Release policy and 1.6 release schedule

2018-03-23 Thread Jie Yu
It's a burden for supporting multiple releases.

1.2 was released March, 2017 (1 year ago), and I know that some users are
still on that version
1.3 was released June, 2017 (9 months ago), and we're still maintaining it
(still backport patches

several
days ago, which some users asked)
1.4 was released Sept, 2017 (6 months ago).
1.5 was released Feb, 2018 (1 month ago).

As you can see, users expect a release to be supported 6-9 months (e.g.,
backports are still needed for 1.3 release, which is 9 months old). If we
were to do monthly minor release, we'll probably need to maintain 6-9
release branches? That's too much of an ask for committers and maintainers.

I also agree with folks that there're benefits doing releases more
frequently. Given the historical data, I'd suggest we do quarterly
releases, and maintain three release branches.

- Jie

On Fri, Mar 23, 2018 at 10:03 AM, Greg Mann  wrote:

> The best motivation I can think of for a shorter release cycle is this: if
> the release cadence is fast enough, then developers will be less likely to
> rush a feature into a release. I think this would be a real benefit, since
> rushing features in hurts stability. *However*, I'm not sure if every two
> months is fast enough to bring this benefit. I would imagine that a
> two-month wait is still long enough that people wouldn't want to wait an
> entire release cycle to land their feature. Just off the top of my head, I
> might guess that a release cadence of 1 month or shorter would be often
> enough that it would always seem reasonable for a developer to wait until
> the next release to land a feature. What do y'all think?
>
> Other motivating factors that have been raised are:
> 1) Many users upgrade on a longer timescale than every ~2 months. I think
> that this doesn't need to affect our decision regarding release timing -
> since we guarantee compatibility of all releases with the same major
> version number, there is no reason that a user needs to upgrade minor
> releases one at a time. It's fine to go from 1.N to 1.(N+3), for example.
> 2) Backporting will be a burden if releases are too short. I think that in
> practice, backporting will not take too much longer. If there was a
> conflict back in the tree somewhere, then it's likely that after resolving
> that conflict once, the same diff can be used to backport the change to
> previous releases as well.
> 3) Adhering strictly to a time-based release schedule will help users plan
> their deployments, since they'll be able to rely on features being released
> on-schedule. However, if we do strict time-based releases, then it will be
> less certain that a particular feature will land in a particular release,
> and users may have to wait a release cycle to get the feature.
>
> Personally, I find the idea of preventing features from being rushed into a
> release very compelling. From that perspective, I would love to see
> releases every month. However, if we're not going to release that often,
> then I think it does make sense to adjust our release schedule to
> accommodate the features that community members want to land in a
> particular release.
>
>
> Jie, I'm curious why you suggest a *minimal* interval between releases.
> Could you elaborate a bit on your motivations there?
>
> Cheers,
> Greg
>
>
> On Fri, Mar 16, 2018 at 2:01 PM, Jie Yu  wrote:
>
> > Thanks Greg for starting this thread!
> >
> >
> >> My primary motivation here is to bring our documented policy in line
> >> with our practice, whatever that may be
> >
> >
> > +100
> >
> > Do people think that we should attempt to bring our release cadence more
> >> in line with our current stated policy, or should the policy be changed
> >> to reflect our current practice?
> >
> >
> > I think a minor release every 2 months is probably too aggressive. I
> don't
> > have concrete data, but my feeling is that the frequency that folks
> upgrade
> > Mesos is low. I know that many users are still on 1.2.x.
> >
> > I'd actually suggest that we have a *minimal* interval between two
> > releases (e.g., 3 months), and provide some buffer for the release
> process.
> > (so we're expecting about 3 releases per year, this matches what we did
> > last year).
> >
> > And we use our dev sync to coordinate on a release after the minimal
> > release interval has elapsed (and elect a release manager).
> >
> > - Jie
> >
> > On Wed, Mar 14, 2018 at 9:51 AM, Zhitao Li 
> wrote:
> >
> >> An additional data point is how long it takes from first RC being cut to
> >> the final release tag vote passes. That probably indicates smoothness of
> >> the release process and how good the quality control measures.
> >>
> >> I would argue for not delaying release for new features and align with
> the
> >> schedule we declared on policy. That makes upstream projects easier to
> >> gauge when a 

Re: Support deadline for tasks

2018-03-23 Thread Benjamin Mahler
Ah, I was more curious about why they need to be killed after a timeout.
E.g. After a particular deadline the work is useless (in Zhitao's case).

On Fri, Mar 23, 2018 at 6:22 PM Sagar Sadashiv Patwardhan 
wrote:

> Hi Benjamin,
> We have a few tasks that should be killed after
> some timeout. We currently have some logic in our scheduler to kill these
> tasks. Would be nice to delegate this to the executor.
>
> - Sagar
>
> On Fri, Mar 23, 2018 at 3:29 PM, Benjamin Mahler 
> wrote:
>
> > Sagar, could you share your use case? Or is it exactly the same as
> > Zhitao's?
> >
> > On Fri, Mar 23, 2018 at 3:15 PM, Sagar Sadashiv Patwardhan <
> > sag...@yelp.com>
> > wrote:
> >
> > > +1
> > >
> > > This will be useful for us(Yelp) as well.
> > >
> > > On Fri, Mar 23, 2018 at 1:31 PM, Benjamin Mahler 
> > > wrote:
> > >
> > > > Also, it's advantageous for mesos to be aware of a hard deadline when
> > it
> > > > comes to resource allocation. We know that some resources will free
> up
> > > and
> > > > can make better decisions when it comes to pre-emption, for example.
> > > > Currently, mesos doesn't know if a task will run forever or will run
> to
> > > > completion.
> > > >
> > > > On Fri, Mar 23, 2018 at 10:07 AM, James Peach 
> > wrote:
> > > >
> > > > >
> > > > >
> > > > > > On Mar 23, 2018, at 9:57 AM, Renan DelValle <
> > > renanidelva...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi Zhitao,
> > > > > >
> > > > > > Since this is something that could potentially be handled by the
> > > > > executor and/or framework, I was wondering if you could speak to
> the
> > > > > advantages of making this a TaskInfo primitive vs having the
> executor
> > > (or
> > > > > even the framework) handle it.
> > > > >
> > > > > There's some discussion around this on https://issues.apache.org/
> > > > > jira/browse/MESOS-8725.
> > > > >
> > > > > My take is that delegating too much to the scheduler makes
> schedulers
> > > > > harder to write and exacerbates the complexity of the system. If 4
> > > > > different schedulers implement this feature, operators are likely
> to
> > > need
> > > > > to understand 4 different ways of doing the same thing, which would
> > be
> > > > > unfortunate.
> > > > >
> > > > > J
> > > >
> > >
> >
>


Re: Support deadline for tasks

2018-03-23 Thread Benjamin Mahler
Sagar, could you share your use case? Or is it exactly the same as Zhitao's?

On Fri, Mar 23, 2018 at 3:15 PM, Sagar Sadashiv Patwardhan 
wrote:

> +1
>
> This will be useful for us(Yelp) as well.
>
> On Fri, Mar 23, 2018 at 1:31 PM, Benjamin Mahler 
> wrote:
>
> > Also, it's advantageous for mesos to be aware of a hard deadline when it
> > comes to resource allocation. We know that some resources will free up
> and
> > can make better decisions when it comes to pre-emption, for example.
> > Currently, mesos doesn't know if a task will run forever or will run to
> > completion.
> >
> > On Fri, Mar 23, 2018 at 10:07 AM, James Peach  wrote:
> >
> > >
> > >
> > > > On Mar 23, 2018, at 9:57 AM, Renan DelValle <
> renanidelva...@gmail.com>
> > > wrote:
> > > >
> > > > Hi Zhitao,
> > > >
> > > > Since this is something that could potentially be handled by the
> > > executor and/or framework, I was wondering if you could speak to the
> > > advantages of making this a TaskInfo primitive vs having the executor
> (or
> > > even the framework) handle it.
> > >
> > > There's some discussion around this on https://issues.apache.org/
> > > jira/browse/MESOS-8725.
> > >
> > > My take is that delegating too much to the scheduler makes schedulers
> > > harder to write and exacerbates the complexity of the system. If 4
> > > different schedulers implement this feature, operators are likely to
> need
> > > to understand 4 different ways of doing the same thing, which would be
> > > unfortunate.
> > >
> > > J
> >
>


Re: Communicate with a container while using Mesos unified container runtime

2018-03-23 Thread Karan Pradhan
Thanks for all the ideas, I'll try dcos cli after upgrading mesos.

On Fri, Mar 23, 2018 at 2:35 PM, Daemeon Reiydelle 
wrote:

> For what I understand to be your use case, I will have queuing services
> that the container queries for its next task. E.g. Kafka "queues"
>
>
>
> Please pardon typo's  ... sent from mobile
>
> daeme...@gmail.com
> USA MOBILE: 415.501.0198 <(415)%20501-0198> (California)
>
>  Original message 
> From: Gilbert Song 
> Date: 3/23/18 13:47 (GMT-08:00)
> To: user 
> Subject: Re: Communicate with a container while using Mesos unified
> container runtime
>
> No, this feature based on Container Attach/Exec
> , which was included
> starting from Mesos 1.2.0. I would recommend an upgrade to Mesos 1.4.1 or
> 1.3.2.
>
> On Thu, Mar 22, 2018 at 5:46 PM, Karan Pradhan 
> wrote:
>
>>
>>
>> On 2018/03/22 23:16:06, Gilbert Song  wrote:
>> > Hi Karan,
>> >
>> > It does not seem to me that launching more mesos containers would add
>> more
>> > overheads.
>> >
>> > If you want to achieve *docker exec* for debugging purpose, Mesos
>> supports
>> > that (not in Mesos CLI yet /cc Armand and Kevin), but you could still
>> rely
>> > on dc/os CLI
>> >  to
>> do
>> > that given you have the taskId.
>> >
>> > Gilbert
>> >
>> > On Wed, Mar 21, 2018 at 12:08 PM, Karan Pradhan > >
>> > wrote:
>> >
>> > >
>> > >
>> > > On 2018/03/21 18:06:48, Gilbert Song  wrote:
>> > > > Hi Karan,
>> > > >
>> > > > Before figuring out some ways to achieve this with Mesos, I would
>> like to
>> > > > better understand your use cases.
>> > > >
>> > > > Do you mean you rely on `docker attach/exec` to send commands to an
>> > > > existing running container?
>> > > >
>> > > > Is there any reason that keeps you from launching a container for
>> each
>> > > > batch job?
>> > > >
>> > > > Gilbert
>> > > >
>> > > > On Wed, Mar 21, 2018 at 10:29 AM, karanprad...@gmail.com <
>> > > > karanprad...@gmail.com> wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > I was docker for running my batch job in which I would follow this
>> > > > > approach:
>> > > > >
>> > > > > 1. Start the docker container
>> > > > > 2. Send commands to the running Docker container with the help of
>> > > docker
>> > > > > python client for each batch of objects.
>> > > > > 3. After all the batches are processed by the docker, shut down
>> the
>> > > > > container.
>> > > > >
>> > > > > I wanted to achieve the same with the help of Mesos and Marathon
>> to
>> > > spin
>> > > > > up containers and submit commands per batch.
>> > > > > But looking a the documents it looks like that this behavior is
>> not
>> > > > > achievable as when Mesos spin up a Docker container with the help
>> of
>> > > Mesos
>> > > > > containerizer and docker/runtime isolation you can submit only one
>> > > command
>> > > > > after which the Sesos framework is killed.
>> > > > >
>> > > > > It would be great if someone could point me to a way to achieve
>> this
>> > > using
>> > > > > Mesos containerizer?
>> > > > >
>> > > > > Thanks,
>> > > > > Karan
>> > > > >
>> > > >
>> > > Hi Gilbert,
>> > > Thanks for taking time answering my question.
>> > >
>> > > Yes as you mentioned I use docker exec to run commands in the
>> container.
>> > > There is no particular reason why we don't run new docker. Would that
>> add
>> > > overhead if I had multiple batches which need to be processed?
>> > >
>> > > Do you know if docker exec is possible on a mesos container running
>> with
>> > > docker/runtime isolation?
>> > >
>> > > Thanks,
>> > > Karan
>> > >
>> >
>> Thanks Gilbert.
>> I have built mesos 1.1.1 from the apache distribution, would dc/os cli
>> work for this version too?
>>
>
>


Re: Communicate with a container while using Mesos unified container runtime

2018-03-23 Thread Daemeon Reiydelle
For what I understand to be your use case, I will have queuing services that 
the container queries for its next task. E.g. Kafka "queues"


Please pardon typo's  ... sent from mobile
Daemeonr@gmail.comUSA MOBILE: 415.501.0198 (California)
 Original message From: Gilbert Song  
Date: 3/23/18  13:47  (GMT-08:00) To: user  Subject: Re: 
Communicate with a container while using Mesos unified container runtime 
No, this feature based on Container Attach/Exec, which was included starting 
from Mesos 1.2.0. I would recommend an upgrade to Mesos 1.4.1 or 1.3.2.
On Thu, Mar 22, 2018 at 5:46 PM, Karan Pradhan  wrote:




On 2018/03/22 23:16:06, Gilbert Song  wrote:

> Hi Karan,

>

> It does not seem to me that launching more mesos containers would add more

> overheads.

>

> If you want to achieve *docker exec* for debugging purpose, Mesos supports

> that (not in Mesos CLI yet /cc Armand and Kevin), but you could still rely

> on dc/os CLI

>  to do

> that given you have the taskId.

>

> Gilbert

>

> On Wed, Mar 21, 2018 at 12:08 PM, Karan Pradhan 

> wrote:

>

> >

> >

> > On 2018/03/21 18:06:48, Gilbert Song  wrote:

> > > Hi Karan,

> > >

> > > Before figuring out some ways to achieve this with Mesos, I would like to

> > > better understand your use cases.

> > >

> > > Do you mean you rely on `docker attach/exec` to send commands to an

> > > existing running container?

> > >

> > > Is there any reason that keeps you from launching a container for each

> > > batch job?

> > >

> > > Gilbert

> > >

> > > On Wed, Mar 21, 2018 at 10:29 AM, karanprad...@gmail.com <

> > > karanprad...@gmail.com> wrote:

> > >

> > > > Hi,

> > > >

> > > > I was docker for running my batch job in which I would follow this

> > > > approach:

> > > >

> > > > 1. Start the docker container

> > > > 2. Send commands to the running Docker container with the help of

> > docker

> > > > python client for each batch of objects.

> > > > 3. After all the batches are processed by the docker, shut down the

> > > > container.

> > > >

> > > > I wanted to achieve the same with the help of Mesos and Marathon to

> > spin

> > > > up containers and submit commands per batch.

> > > > But looking a the documents it looks like that this behavior is not

> > > > achievable as when Mesos spin up a Docker container with the help of

> > Mesos

> > > > containerizer and docker/runtime isolation you can submit only one

> > command

> > > > after which the Sesos framework is killed.

> > > >

> > > > It would be great if someone could point me to a way to achieve this

> > using

> > > > Mesos containerizer?

> > > >

> > > > Thanks,

> > > > Karan

> > > >

> > >

> > Hi Gilbert,

> > Thanks for taking time answering my question.

> >

> > Yes as you mentioned I use docker exec to run commands in the container.

> > There is no particular reason why we don't run new docker. Would that add

> > overhead if I had multiple batches which need to be processed?

> >

> > Do you know if docker exec is possible on a mesos container running with

> > docker/runtime isolation?

> >

> > Thanks,

> > Karan

> >

>

Thanks Gilbert.

I have built mesos 1.1.1 from the apache distribution, would dc/os cli work for 
this version too?





Re: Communicate with a container while using Mesos unified container runtime

2018-03-23 Thread Gilbert Song
No, this feature based on Container Attach/Exec
, which was included
starting from Mesos 1.2.0. I would recommend an upgrade to Mesos 1.4.1 or
1.3.2.

On Thu, Mar 22, 2018 at 5:46 PM, Karan Pradhan 
wrote:

>
>
> On 2018/03/22 23:16:06, Gilbert Song  wrote:
> > Hi Karan,
> >
> > It does not seem to me that launching more mesos containers would add
> more
> > overheads.
> >
> > If you want to achieve *docker exec* for debugging purpose, Mesos
> supports
> > that (not in Mesos CLI yet /cc Armand and Kevin), but you could still
> rely
> > on dc/os CLI
> >  to do
> > that given you have the taskId.
> >
> > Gilbert
> >
> > On Wed, Mar 21, 2018 at 12:08 PM, Karan Pradhan 
> > wrote:
> >
> > >
> > >
> > > On 2018/03/21 18:06:48, Gilbert Song  wrote:
> > > > Hi Karan,
> > > >
> > > > Before figuring out some ways to achieve this with Mesos, I would
> like to
> > > > better understand your use cases.
> > > >
> > > > Do you mean you rely on `docker attach/exec` to send commands to an
> > > > existing running container?
> > > >
> > > > Is there any reason that keeps you from launching a container for
> each
> > > > batch job?
> > > >
> > > > Gilbert
> > > >
> > > > On Wed, Mar 21, 2018 at 10:29 AM, karanprad...@gmail.com <
> > > > karanprad...@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I was docker for running my batch job in which I would follow this
> > > > > approach:
> > > > >
> > > > > 1. Start the docker container
> > > > > 2. Send commands to the running Docker container with the help of
> > > docker
> > > > > python client for each batch of objects.
> > > > > 3. After all the batches are processed by the docker, shut down the
> > > > > container.
> > > > >
> > > > > I wanted to achieve the same with the help of Mesos and Marathon to
> > > spin
> > > > > up containers and submit commands per batch.
> > > > > But looking a the documents it looks like that this behavior is not
> > > > > achievable as when Mesos spin up a Docker container with the help
> of
> > > Mesos
> > > > > containerizer and docker/runtime isolation you can submit only one
> > > command
> > > > > after which the Sesos framework is killed.
> > > > >
> > > > > It would be great if someone could point me to a way to achieve
> this
> > > using
> > > > > Mesos containerizer?
> > > > >
> > > > > Thanks,
> > > > > Karan
> > > > >
> > > >
> > > Hi Gilbert,
> > > Thanks for taking time answering my question.
> > >
> > > Yes as you mentioned I use docker exec to run commands in the
> container.
> > > There is no particular reason why we don't run new docker. Would that
> add
> > > overhead if I had multiple batches which need to be processed?
> > >
> > > Do you know if docker exec is possible on a mesos container running
> with
> > > docker/runtime isolation?
> > >
> > > Thanks,
> > > Karan
> > >
> >
> Thanks Gilbert.
> I have built mesos 1.1.1 from the apache distribution, would dc/os cli
> work for this version too?
>


Re: Support deadline for tasks

2018-03-23 Thread Benjamin Mahler
Also, it's advantageous for mesos to be aware of a hard deadline when it
comes to resource allocation. We know that some resources will free up and
can make better decisions when it comes to pre-emption, for example.
Currently, mesos doesn't know if a task will run forever or will run to
completion.

On Fri, Mar 23, 2018 at 10:07 AM, James Peach  wrote:

>
>
> > On Mar 23, 2018, at 9:57 AM, Renan DelValle 
> wrote:
> >
> > Hi Zhitao,
> >
> > Since this is something that could potentially be handled by the
> executor and/or framework, I was wondering if you could speak to the
> advantages of making this a TaskInfo primitive vs having the executor (or
> even the framework) handle it.
>
> There's some discussion around this on https://issues.apache.org/
> jira/browse/MESOS-8725.
>
> My take is that delegating too much to the scheduler makes schedulers
> harder to write and exacerbates the complexity of the system. If 4
> different schedulers implement this feature, operators are likely to need
> to understand 4 different ways of doing the same thing, which would be
> unfortunate.
>
> J


Re: Mesos scalability

2018-03-23 Thread Benjamin Mahler
Hi Karan,

Only one master can be elected leader in the current architecture. It's
unlikely we're at a point where we need to balance work across masters to
push scalability further. That comes with a lot of complexity, and we still
have a lot of room for performance improvements on a single leader
architecture.

There are successful clusters sized beyond 35k machines as well, so
performance generally depend on the characteristics of the workloads. I
believe folks running clusters this large generally use a high core count
server (e.g. 24 cores), as to whether this is necessary is not clear to me
and again depends on the workloads, but certainly the master will continue
to improve its ability to leverage more cores for better performance.

In terms of the performance issues you've experienced, if you could provide
some data (e.g. flamegraphs of a backlogged master) that would help us
start to get a better sense of what's happening.

Ben

On Thu, Mar 22, 2018 at 2:26 PM, Karan Pradhan 
wrote:

> Hi All,
>
> I had the following questions:
> 1.
> I was wondering if it is possible to have multiple Mesos masters as
> elected masters in a Mesos cluster so that the load can be balanced amongst
> the masters. Is there a way to achieve this?
> In general, can there be a load balancer for the Mesos masters?
>
> 2.
> I have seen spikes in the Mesos event queues while running spark SQL
> workloads with multiple stages. So I was wondering what is a better way to
> handle these scalability issues. I noticed that compute intensive machines
> were able to deal with those workloads better. Is there a particular
> hardware requirement or requirement for the number of masters for scaling a
> Mesos cluster horizontally? After reading success stories which mention
> that Mesos is deployed for ~10K machines, I was curious about the hardware
> used and the number of masters in this case.
>
> It would be awesome if I could get some insight into these questions.
>
> Thanks,
> Karan
>
>


Re: Support deadline for tasks

2018-03-23 Thread James Peach


> On Mar 23, 2018, at 9:57 AM, Renan DelValle  wrote:
> 
> Hi Zhitao,
> 
> Since this is something that could potentially be handled by the executor 
> and/or framework, I was wondering if you could speak to the advantages of 
> making this a TaskInfo primitive vs having the executor (or even the 
> framework) handle it.

There's some discussion around this on 
https://issues.apache.org/jira/browse/MESOS-8725.

My take is that delegating too much to the scheduler makes schedulers harder to 
write and exacerbates the complexity of the system. If 4 different schedulers 
implement this feature, operators are likely to need to understand 4 different 
ways of doing the same thing, which would be unfortunate. 

J

Re: Release policy and 1.6 release schedule

2018-03-23 Thread Greg Mann
The best motivation I can think of for a shorter release cycle is this: if
the release cadence is fast enough, then developers will be less likely to
rush a feature into a release. I think this would be a real benefit, since
rushing features in hurts stability. *However*, I'm not sure if every two
months is fast enough to bring this benefit. I would imagine that a
two-month wait is still long enough that people wouldn't want to wait an
entire release cycle to land their feature. Just off the top of my head, I
might guess that a release cadence of 1 month or shorter would be often
enough that it would always seem reasonable for a developer to wait until
the next release to land a feature. What do y'all think?

Other motivating factors that have been raised are:
1) Many users upgrade on a longer timescale than every ~2 months. I think
that this doesn't need to affect our decision regarding release timing -
since we guarantee compatibility of all releases with the same major
version number, there is no reason that a user needs to upgrade minor
releases one at a time. It's fine to go from 1.N to 1.(N+3), for example.
2) Backporting will be a burden if releases are too short. I think that in
practice, backporting will not take too much longer. If there was a
conflict back in the tree somewhere, then it's likely that after resolving
that conflict once, the same diff can be used to backport the change to
previous releases as well.
3) Adhering strictly to a time-based release schedule will help users plan
their deployments, since they'll be able to rely on features being released
on-schedule. However, if we do strict time-based releases, then it will be
less certain that a particular feature will land in a particular release,
and users may have to wait a release cycle to get the feature.

Personally, I find the idea of preventing features from being rushed into a
release very compelling. From that perspective, I would love to see
releases every month. However, if we're not going to release that often,
then I think it does make sense to adjust our release schedule to
accommodate the features that community members want to land in a
particular release.


Jie, I'm curious why you suggest a *minimal* interval between releases.
Could you elaborate a bit on your motivations there?

Cheers,
Greg


On Fri, Mar 16, 2018 at 2:01 PM, Jie Yu  wrote:

> Thanks Greg for starting this thread!
>
>
>> My primary motivation here is to bring our documented policy in line
>> with our practice, whatever that may be
>
>
> +100
>
> Do people think that we should attempt to bring our release cadence more
>> in line with our current stated policy, or should the policy be changed
>> to reflect our current practice?
>
>
> I think a minor release every 2 months is probably too aggressive. I don't
> have concrete data, but my feeling is that the frequency that folks upgrade
> Mesos is low. I know that many users are still on 1.2.x.
>
> I'd actually suggest that we have a *minimal* interval between two
> releases (e.g., 3 months), and provide some buffer for the release process.
> (so we're expecting about 3 releases per year, this matches what we did
> last year).
>
> And we use our dev sync to coordinate on a release after the minimal
> release interval has elapsed (and elect a release manager).
>
> - Jie
>
> On Wed, Mar 14, 2018 at 9:51 AM, Zhitao Li  wrote:
>
>> An additional data point is how long it takes from first RC being cut to
>> the final release tag vote passes. That probably indicates smoothness of
>> the release process and how good the quality control measures.
>>
>> I would argue for not delaying release for new features and align with the
>> schedule we declared on policy. That makes upstream projects easier to
>> gauge when a feature will be ready and when they can try it out.
>>
>> On Tue, Mar 13, 2018 at 3:10 PM, Greg Mann  wrote:
>>
>> > Hi folks,
>> > During the recent API working group meeting [1], we discussed the
>> release
>> > schedule. This has been a recurring topic of discussion in the developer
>> > sync meetings, and while our official policy still specifies time-based
>> > releases at a bi-monthly cadence, in practice we tend to gate our
>> releases
>> > on the completion of certain features, and our releases go out on a
>> > less-frequent basis. Here are the dates of our last few release blog
>> posts,
>> > which I'm assuming correlate pretty well with the actual release dates:
>> >
>> > 1.5.0: 2/8/18
>> > 1.4.0: 9/18/17
>> > 1.3.0: 6/7/17
>> > 1.2.0: 3/8/17
>> > 1.1.0: 11/10/16
>> >
>> > Our current cadence seems to be around 3-4 months between releases,
>> while
>> > our documentation states that we release every two months [2]. My
>> primary
>> > motivation here is to bring our documented policy in line with our
>> > practice, whatever that may be. Do people think that we should attempt
>> to
>> > bring our release cadence more in line with 

Re: Support deadline for tasks

2018-03-23 Thread Renan DelValle
Hi Zhitao,

Since this is something that could potentially be handled by the executor
and/or framework, I was wondering if you could speak to the advantages of
making this a TaskInfo primitive vs having the executor (or even the
framework) handle it.

-Renan


On Fri, Mar 23, 2018 at 9:19 AM, Zhitao Li  wrote:

> Thanks James. I'll update the JIRA with our names and start with some
> prototype.
>
> On Thu, Mar 22, 2018 at 9:07 PM, James Peach  wrote:
>
>>
>>
>> > On Mar 22, 2018, at 10:06 AM, Zhitao Li  wrote:
>> >
>> > In our environment, we run a lot of batch jobs, some of which have
>> tight timeline. If any tasks in the job runs longer than x hours, it does
>> not make sense to run it anymore.
>> >
>> > For instance, a team would submit a job which builds a weekly index and
>> repeats every Monday. If the job does not finish before next Monday for
>> whatever reason, there is no point to keep any task running.
>> >
>> > We believe that implementing deadline tracking distributed across our
>> cluster makes more sense as it makes the system more scalable and also
>> makes our centralized state machine simpler.
>> >
>> > One idea I have right now is to add an  optional TimeInfo deadline to
>> TaskInfo field, and all default executors in Mesos can simply terminate the
>> task and send a proper StatusUpdate.
>> >
>> > I summarized above idea in MESOS-8725.
>> >
>> > Please let me know what you think. Thanks!
>>
>> This sounds both useful and simple to implement. I’m happy to shepherd if
>> you’d like
>>
>> J
>
>
>
>
> --
> Cheers,
>
> Zhitao Li
>


Re: Support deadline for tasks

2018-03-23 Thread Zhitao Li
Thanks James. I'll update the JIRA with our names and start with some
prototype.

On Thu, Mar 22, 2018 at 9:07 PM, James Peach  wrote:

>
>
> > On Mar 22, 2018, at 10:06 AM, Zhitao Li  wrote:
> >
> > In our environment, we run a lot of batch jobs, some of which have tight
> timeline. If any tasks in the job runs longer than x hours, it does not
> make sense to run it anymore.
> >
> > For instance, a team would submit a job which builds a weekly index and
> repeats every Monday. If the job does not finish before next Monday for
> whatever reason, there is no point to keep any task running.
> >
> > We believe that implementing deadline tracking distributed across our
> cluster makes more sense as it makes the system more scalable and also
> makes our centralized state machine simpler.
> >
> > One idea I have right now is to add an  optional TimeInfo deadline to
> TaskInfo field, and all default executors in Mesos can simply terminate the
> task and send a proper StatusUpdate.
> >
> > I summarized above idea in MESOS-8725.
> >
> > Please let me know what you think. Thanks!
>
> This sounds both useful and simple to implement. I’m happy to shepherd if
> you’d like
>
> J




-- 
Cheers,

Zhitao Li


Re: Mesos scalability

2018-03-23 Thread Harold Dost
Karan to answer question 1. you can setup a DNS name with all of your
masters in it and because of this
https://github.com/apache/mesos/blob/master/docs/scheduler-http-api.md#master-detection
the leading master will always be the one which ultimately handles the
traffic.



*Harold Dost* | @hdost

On Fri, Mar 23, 2018 at 4:40 PM, daemeon reiydelle 
wrote:

> Even clustered, one must look at the specific event generating work loads,
> as well as monitoring the target systems for utilization.
>
>
> <==>
> "Who do you think made the first stone spear? The Asperger guy.
> If you get rid of the autism genetics, there would be no Silicon Valley"
> Temple Grandin
>
>
> *Daemeon C.M. ReiydelleSan Francisco 1.415.501.0198
> <(415)%20501-0198>London 44 020 8144 9872*
>
>
> On Thu, Mar 22, 2018 at 2:26 PM, Karan Pradhan 
> wrote:
>
>> Hi All,
>>
>> I had the following questions:
>> 1.
>> I was wondering if it is possible to have multiple Mesos masters as
>> elected masters in a Mesos cluster so that the load can be balanced amongst
>> the masters. Is there a way to achieve this?
>> In general, can there be a load balancer for the Mesos masters?
>>
>> 2.
>> I have seen spikes in the Mesos event queues while running spark SQL
>> workloads with multiple stages. So I was wondering what is a better way to
>> handle these scalability issues. I noticed that compute intensive machines
>> were able to deal with those workloads better. Is there a particular
>> hardware requirement or requirement for the number of masters for scaling a
>> Mesos cluster horizontally? After reading success stories which mention
>> that Mesos is deployed for ~10K machines, I was curious about the hardware
>> used and the number of masters in this case.
>>
>> It would be awesome if I could get some insight into these questions.
>>
>> Thanks,
>> Karan
>>
>>
>


Re: Mesos scalability

2018-03-23 Thread daemeon reiydelle
Even clustered, one must look at the specific event generating work loads,
as well as monitoring the target systems for utilization.


<==>
"Who do you think made the first stone spear? The Asperger guy.
If you get rid of the autism genetics, there would be no Silicon Valley"
Temple Grandin


*Daemeon C.M. ReiydelleSan Francisco 1.415.501.0198London 44 020 8144 9872*


On Thu, Mar 22, 2018 at 2:26 PM, Karan Pradhan 
wrote:

> Hi All,
>
> I had the following questions:
> 1.
> I was wondering if it is possible to have multiple Mesos masters as
> elected masters in a Mesos cluster so that the load can be balanced amongst
> the masters. Is there a way to achieve this?
> In general, can there be a load balancer for the Mesos masters?
>
> 2.
> I have seen spikes in the Mesos event queues while running spark SQL
> workloads with multiple stages. So I was wondering what is a better way to
> handle these scalability issues. I noticed that compute intensive machines
> were able to deal with those workloads better. Is there a particular
> hardware requirement or requirement for the number of masters for scaling a
> Mesos cluster horizontally? After reading success stories which mention
> that Mesos is deployed for ~10K machines, I was curious about the hardware
> used and the number of masters in this case.
>
> It would be awesome if I could get some insight into these questions.
>
> Thanks,
> Karan
>
>