Re: Future transfer of MesosCon 2015 videos

2023-03-26 Thread Vinod Kone
Thanks Dave for driving this. 

Thanks,
Vinod

> On Mar 26, 2023, at 11:57 AM, Dave Lester  wrote:
> 
> I assume the clarification is about MesosCon 2014 videos. I believe Twitter 
> owns the copyright to the videos which were recorded without cost to the 
> event by a Twitter staff member and published to the company's "Twitter 
> University" YouTube channel. I don't believe those videos were licensed so 
> we'd need their sign-off before copying anything.
> 
> It doesn't sound like there's a concern or objection regarding the proposal 
> to migrate MesosCon 2015 videos to LF YouTube channel, but I encourage folks 
> to chime-in if they have additional questions and feedback.
> 
>> On 2023/03/26 10:21:49 Marc wrote:
>> 
>> What is the problem with just copying them?
>> 
>>> 
>>> Clarifying my earlier message: MesosCon 2015 videos will be migrated to
>>> the LF YouTube channel unless a concern is raised within 72 hours.
>>> 
>>> I believe all MesosCon 2015 videos were released under a Creative
>>> Commons Attribution license (this can be confirmed by viewing the notice
>>> beside individual videos or in their descriptions).
>>> 
>>> Unfortunately, while video from the conference's first event MesosCon
>>> 2014 was recorded
>>> (https://www.youtube.com/playlist?list=PLDVc2EaAVPg9kp8cFzjR1Yxj96I4U5EG
>>> N) I assume that Twitter owns the rights to the recordings. If someone
>>> still working at Twitter can get the OK to migrate these as well I'm
>>> happy to make the appropriate LF connections.
>>> 
>> 


New PMC Chair

2021-04-29 Thread Vinod Kone
Hi community,

Just wanted to let you all know that the board passed the resolution to
elect a new PMC chair!

Hearty congratulations to *Qian Zhang* for becoming the new Apache Mesos
PMC chair and VP of the project.

Thanks,


[VOTE] Move Apache Mesos to Attic

2021-04-05 Thread Vinod Kone
Hi folks,

Based on the recent conversations

on our mailing list, it seems to me that the majority consensus among the
existing PMC is to move the project to the attic 
and let the interested community members collaborate on a fork in Github.

I would like to call a vote to dissolve the PMC and move the project to the
attic.

Please reply to this thread with your vote. Only binding votes from
PMC/committers count towards the final tally but everyone in the community
is encouraged to vote. See process here
.

Thanks,


Re: Next Steps

2021-03-15 Thread Vinod Kone
>
>
> How many man hours where spend on mesos in 2020, 2019 and 2018?
>
>
Roughly 5-6 ppl (in 2020),10-11 (in 2019), 16-18 (in 2018)


Re: Next Steps

2021-03-15 Thread Vinod Kone
Hi folks,

Sorry for the radio silence on my part for the last couple weeks. My Apache
emails were not getting delivered to my inbox due to some filter mixup on
my end. Sorry about that.

I've read through the various threads and here's how I summarize the
situation. We basically have 2 camps

*Attic:*
Most existing PMC members who have chimed in so far seemed to be in favor
of moving the project to Attic. The exception is Qian (who is willing to
step up to be the new PMC chair, thanks Qian!). The main argument for this
seems to be that it'll be hard to re-activate the project at this juncture
with new PMC members / committers. Also that it signals the current state
of the project more accurately.

*Re-activate:*
There are some active users in the community who would like to see this
project stay alive and are even willing to step up to become committers /
contributors. Some of these users are working for companies who are using
Mesos in production. They would like to know potential new roadmap (there
is a separate thread going on for this) and manpower needed (my take is 6-8
ppl to cover different areas of the project).

*My take:*

In addition to the public threads, we've had a thread on our private
mailing list to see which of the current committers are interested in being
active. And so far that thread has gotten *0* responses. This is
unfortunate because, except for Qian no existing committer/PMC members are
willing or able to contribute or mentor new contributors.

Additionally, the current guidelines
 we have for
adding new committers is a pretty high bar and I don't think any of the
current contributors would be immediately eligible to be voted in as
committers. This means we either need to change the guidelines or we should
have some existing committers mentor some of the contributors into
committers. Given the lack of commitment from most of the existing PMC,
this will fall solely on Qian's shoulders which is quite a burden.

Since the existing committers are unable or unwilling to mentor new
contributors into new committers, I think moving the project to attic is
the right move. If there is no objection to this, I'm happy to call a vote
for this.

We could still explore the possibility of activating "
https://github.com/mesos/mesos; as the one true fork outside of ASF so that
the interested parties can still contribute and collaborate. And if the
project continues to thrive here, we can reach back out to ASF to
re-activate the project, down the line.

Thanks,


On Sat, Feb 27, 2021 at 7:45 AM Damien GERARD  wrote:

> On 2021-02-26 09:05 PM, Charles-François Natali wrote:
> > As mentioned before I'd also be happy to contribute.
> >
> > Concretely, what's the next step to move this forward?
> >
> > On Fri, 26 Feb 2021, 11:15 Thomas Langé,  wrote:
> >
> >> Hello,
> >>
> >> I'm part of Criteo team as well, and as Grégoire said, we plan to
> >> support Mesos internally for some time. I would like to
> >> propose my help as well as a committer, and contribute as much as I
> >> can to this project.
>
> At Rakuten we also have a couple of clusters. As also mentionned before,
> happy to contribute.
> But yeah, need a plan of action :p
>
>
> >>
> >> Br,
> >>
> >> Thomas
> >>
> >> -
> >>
> >> From: Grégoire Seux 
> >> Sent: Friday, 26 February 2021 11:12
> >> To: priv...@mesos.apache.org ; dev
> >> ; user 
> >> Subject: Re: Next Steps
> >>
> >> Hello all,
> >>
> >> here at Criteo, we heavily use Mesos and plan to do so for a
> >> foreseeable future alongside other alternatives.
> >> I am ok to become committer and help the project if you are looking
> >> for contributors.
> >> It seems finding committers will be doable but finding a PMC chair
> >> will be difficult.
> >>
> >> To give some context on our usage, Criteo is running 12 Mesos
> >> cluster running a light fork of Mesos 1.9.x.
> >> Each cluster has 10+ distinct marathons frameworks, a flink
> >> framework, an instance of Aurora and an in-house framework.
> >> We strongly appreciate the ability to scale the number of nodes
> >> (3500 on the largest cluster and growing), the simplicity of the
> >> project overall and the extensibility through modules.
> >>
> >> --
> >>
> >> Grégoire
>
> --
> Damien GERARD
>


Re: Next Steps

2021-02-18 Thread Vinod Kone
Good to see some interest in helping with project maintenance. 

Qian can you start a new email about figuring out the roadmap for the project?

Thanks,
Vinod

> On Feb 18, 2021, at 11:18 AM, Charles-François Natali  
> wrote:
> 
> Speaking as someone who contributed a few patches and would like to get
> more involved, I find it a bit difficult to get MRs reviewed and merged.
> I think it's probably because the current committers have other priorities
> now that D2iQ focus has shifted, which is understandable but makes it
> harder for outsiders to contribute.
> Is there anything which could be done about that?
> 
> Cheers,
> 
> 
> 
>> On Thu, 18 Feb 2021, 14:30 Qian Zhang,  wrote:
>> 
>> Hi Vinod,
>> 
>> I am still interested in the project. As other folks said, we need to have
>> a direction for the project. I think there are still a lot of Mesos
>> users/customers in the mail list, can you please send another mail to
>> collect their requirements / pain points on Mesos, and then we can try to
>> set up a roadmap for the project to move forward.
>> 
>> 
>> Regards,
>> Qian Zhang
>> 
>> 
>> On Thu, Feb 18, 2021 at 9:16 PM Andrei Sekretenko 
>> wrote:
>> 
>>> IIUC, Attic is not intended for projects which still have active users
>>> and thus might be in need of fixing bugs.
>>> 
>>> Key items about moving project to Attic:
 It is not intended to:
 - Rebuild community
 - Make bugfixes
 - Make releases
>>> 
 Projects whose PMC are unable to muster 3 votes for a release, who have
>>> no active committers or are unable to fulfill their reporting duties to the
>>> board are all good candidates for the Attic.
>>> 
>>> As a D2iQ employee, I can say that if we find a bug critical for our
>>> customers, we will be interested in fixing that. Should the project be
>>> moved into Attic, the fix will be present only in forks (which might
>>> mean our internal forks).
>>> 
>>> I could imagine that other entities and people using Mesos are in a
>>> similar position with regards to bugfixes.
>>> If this is true, then moving the project to Attic in the near future
>>> is not a proper solution to the issue of insufficient bandwidth of the
>>> active PMC members/chair.
>>> 
>>> ---
>>> A long-term future of the project is a different story, which, in my
>>> personal view, will "end" either in moving the project into Attic or
>>> in shifting the project direction from what it used to be in the
>>> recent few years to something substantially different. IMO, this
>>> requires a  _separate_ discussion.
>>> 
>>> Damien's questions sound like a good starting point for that
>>> discussion, I'll try to answer them from my committer/PMC member
>>> perspective when I have enough time.
>>> 
>>> On Thu, 18 Feb 2021 at 12:49, Charles-François Natali
>>>  wrote:
 
 Thanks Tomek, that's what I suspected.
 It would therefore make it much more difficult for anyone to carry on
>>> since it would effectively have to be a fork, etc.
 I think it'd be a bit of a shame, but I understand Benjamin's point.
 I hope it can be avoided.
 
 
 Cheers,
 
 
 
 On Thu, 18 Feb 2021, 11:02 Tomek Janiszewski, 
>>> wrote:
> 
> Moving to attic is making project read only
> https://attic.apache.org/
> https://attic.apache.org/projects/aurora.html
> 
> czw., 18 lut 2021, 11:56 użytkownik Charles-François Natali <
>>> cf.nat...@gmail.com> napisał:
>> 
>> I'm not familiar with the attic but would it still allow to actually
>> develop, make commits to the repository etc?
>> 
>> 
>> On Thu, 18 Feb 2021, 08:27 Benjamin Bannier, 
>>> wrote:
>> 
>>> Hi Vinod,
>>> 
 I would like to start a discussion around the future of the Mesos
>>> project.
 
 As you are probably aware, the number of active committers and
>>> contributors
 to the project have declined significantly over time. As of today,
>>> there's
 no active development of any features or a public release
>>> planned. On the
 flip side, I do know there are a few companies who are still
>>> actively
>>> using
 Mesos.
>>> 
>>> Thanks for starting this discussion Vinod. Looking at Slack, mailing
>>> lists, JIRA and reviewboard/github the project has wound down a lot
>>> in
>>> the last 12+ months.
>>> 
 Given that, we need to assess if there's interest in the
>>> community to
>>> keep
 this project moving forward. Specifically, we need some active
>>> committers
 and PMC members who are going to manage the project. Ideally,
>>> these would
 be people who are using Mesos in some capacity and can make code
 contributions.
>>> 
>>> While I have seen a few non-committer folks contribute patches in
>>> the
>>> last months, I feel it might be too late to bootstrap an active
>>> community at this point.
>>> 
>>> Apache Mesos is still mentioned 

Next Steps

2021-02-17 Thread Vinod Kone
Hi folks,

I would like to start a discussion around the future of the Mesos project.

As you are probably aware, the number of active committers and contributors
to the project have declined significantly over time. As of today, there's
no active development of any features or a public release planned. On the
flip side, I do know there are a few companies who are still actively using
Mesos.

Given that, we need to assess if there's interest in the community to keep
this project moving forward. Specifically, we need some active committers
and PMC members who are going to manage the project. Ideally, these would
be people who are using Mesos in some capacity and can make code
contributions.

If there is no active interest, we will likely need to figure out steps for
retiring the project.

*Call for action: If you are interested in becoming a committer/PMC member
(including PMC chair) and actively maintain the project, please reply to
this email.*

I personally don't foresee myself being very active in the Mesos project
going forward, so I'm planning to step down from my chair role as soon as
we find a replacement.

Thanks,
Vinod


Re: [BULK]Re: cgroup CPUSET for mesos agent

2021-01-14 Thread Vinod Kone
Great to hear! Thanks for the update.

On Thu, Jan 14, 2021 at 5:18 PM Charles-François Natali 
wrote:

> It's a bit old but in case it could help, we recently implemented this
> at work - here's how we did it:
> - the NUMA topology is exposed via agent custom resources
> - the framework does the allocation of the corresponding resources to
> the tasks according to the NUMA topology: e.g. if the task requests 2
> CPUs within the same NUMA node, the framework would allocate them
> - a custom executor then implements the CPU affinity/cpuset using the
> resources provided by the framework
>
> It works really nicely.
>
> Cheers,
>
> Charles
>
>
> Le mar. 7 juil. 2020 à 18:12, Milind Chabbi  a écrit :
> >
> > Grégoire, thanks for your reply. This is super helpful to make a
> stronger case around the affinity benefits.
> > Would you be able to offer additional details that you mentioned? I am
> definitely interested.
> > Is your isolator source code publicly available?
> >
> > -Milind
> >
> > On Tue, Jul 7, 2020 at 3:14 AM Grégoire Seux  wrote:
> >>
> >> Hello,
> >>
> >> I'd like to give you a return of experience because we've worked on
> this last year.
> >> We've used CFS bandwidth isolation for several years and encountered
> many issues (lack of predictability, bugs present in old linux kernels and
> lack of cache/memory locality). At some point, we've implemented a custom
> isolator to manage cpusets (using
> https://github.com/criteo/mesos-command-modules/ as a base to write an
> isolator in a scripting language).
> >>
> >> The isolator had a very simple behavior: upon new task, look at which
> cpus are not within a cpuset cgroup, select (if possible) cpus from the
> same numa node and create cpuset cgroup for the starting task.
> >> In practice, it provided a general decrease of cpu consumption (up to
> 8% of some cpu intensive applications) and better ability to reason about
> the cpu isolation model.
> >> The allocation is optimistic: it tries to use cpus from the same numa
> node but if it's not possible, task is spread accross nodes. In practice it
> happens very rarely because of one small optimization to assign cpus from
> the most loaded numa node (decreasing fragmentation of available cpus
> accross numa nodes).
> >>
> >> I'd be glad to give more details if you are interested
> >>
> >> --
> >> Grégoire
>


Re: Paid help for getting csi ceph working

2020-09-08 Thread Vinod Kone
SERP is not available yet.

We are currently working on an alternative way to get external storage into
Mesos instead of using SLRP.  Please watch the progress here:
https://issues.apache.org/jira/browse/MESOS-10141 . MVP support will land
in the upcoming release of Mesos.

On Mon, Sep 7, 2020 at 2:08 PM Marc Roos  wrote:

>
>
> Is there anyone interested in giving some paid help to get me up and
> running with an slrp with ceph? I assume this serp is not available
> still not?
>
>
>
>


Re: mesos master default drop acl

2020-08-07 Thread Vinod Kone
Not sure if you came across
http://mesos.apache.org/documentation/latest/authorization/ but I hope it
can answer your questions.

On Thu, Jul 30, 2020 at 4:03 PM Marc Roos  wrote:

>
>
> Currently I am running on a testing environment with some default acl I
> found[1]. I have configured  mesos-credentials, and afaik everything
> agents/marathon framework is authenticating. So I thought about
> converting the acl to default drop/deny. However I see there are quite a
> few options.
>
> Is it advicable to even set the all to deny? Is there an example how to
> set the url for GetEndpoint?
>
> [2]
>
> https://github.com/apache/mesos/blob/master/include/mesos/authorizer/acls.proto
> http://mesos.apache.org/documentation/latest/configuration/master/
>
> [1]
> {
>   "run_tasks": [
> {
>   "principals": {
> "type": "ANY"
>   },
>   "users": {
> "type": "ANY"
>   }
> }
>   ],
>   "register_frameworks": [
> {
>   "principals": {
> "type": "ANY"
>   },
>   "roles": {
> "type": "ANY"
>   }
> }
>   ]
> }
>


Re: getting correct metrics port from SRV records.

2020-07-27 Thread Vinod Kone
+Jason Kölker 

On Mon, Jul 27, 2020 at 1:38 PM Marc Roos  wrote:

>
> Is there a way to identify the correct port via dns? I have created a
> task with two ports[1]. But a dns srv query does not show anything
> different than the port number. How can I identify the correct port?
> Mesos-master tasks endpoint[3] shows the port names, is there a way to
> get these from dns?
>
>
> [1]
> "networks": [ { "mode": "host"} ],
> "portDefinitions": [{"port": 0, "name": "https",  "protocol": "tcp"},
> {"port": 0, "name": "metrics",  "protocol": "tcp"}]
>
>
> [2]
> [@test2 image-synapse]$ dig +short @192.168.10.14
> _synapse.dev._tcp.marathon.mesos SRV
> 0 1 31031 synapse.dev-nppzf-s0.marathon.mesos.
> 0 1 31032 synapse.dev-nppzf-s0.marathon.mesos.
>
>
> [3]
> mesos-master /tasks/
>
>   "discovery": {
> "visibility": "FRAMEWORK",
> "name": "synapse.dev",
> "ports": {
>   "ports": [
> {
>   "number": 31031,
>   "name": "https",
>   "protocol": "tcp"
> },
> {
>   "number": 31032,
>   "name": "metrics",
>   "protocol": "tcp"
> }
>   ]
> }
>   },
>
>


Re: fyi: mesos-dns is not registering all ip addresses

2020-07-27 Thread Vinod Kone
+Jason Kölker 

On Mon, Jul 27, 2020 at 8:03 AM Alex Evonosky 
wrote:

> Thank you Marc for the clarification.
>
>
>
> On Mon, Jul 27, 2020 at 8:50 AM Marc Roos 
> wrote:
>
>>
>> Hi Alex,
>>
>> My config.json is quite similar, but having "IPSources": ["netinfo",
>> "mesos", "host"]
>>
>> You will only run into this issue when you have multihomed tasks, having
>> two or more network adapters, eth0, eth1 etc
>>
>>
>>
>>
>> -Original Message-
>> From: Alex Evonosky [mailto:alex.evono...@gmail.com]
>> Sent: maandag 27 juli 2020 14:36
>> To: user@mesos.apache.org
>> Subject: Re: fyi: mesos-dns is not registering all ip addresses
>>
>> thank you.
>>
>> We have been running mesos-dns for years now without any issues.  The
>> docker apps spin up on marathon and automatically gets picked up by
>> mesos-dns...
>>
>> This is our config.json:
>>
>>
>> {
>>   "zk": "zk://10.10.10.51:2181,10.10.10.52:2181,10.10.10.53:2181/mesos",
>>   "masters": ["10.10.10.51:5050", "10.10.10.52:5050",
>> "10.10.10.53:5050"],
>>   "refreshSeconds": 3,
>>   "ttl": 3,
>>   "domain": "mesos",
>>   "port": 53,
>>   "resolvers": ["10.10.10.88", "10.10.10.86"],
>>   "timeout": 3,
>>   "httpon": true,
>>   "dnson": true,
>>   "httpport": 8123,
>>   "externalon": true,
>>   "listener": "0.0.0.0",
>>   "SOAMname": "ns1.mesos",
>>   "SOARname": "root.ns1.mesos",
>>   "SOARefresh": 5,
>>   "SOARetry":   600,
>>   "SOAExpire":  86400,
>>   "SOAMinttl": 5,
>>   "IPSources":["mesos", "host"]
>> }
>>
>>
>>
>>
>> we just have our main DNS resolvers have a zone  "mesos.marathon" and
>> forwards the request to this cluster...
>>
>>
>>
>> On Mon, Jul 27, 2020 at 3:56 AM Marc Roos 
>> wrote:
>>
>>
>>
>>
>> I am not sure if mesos-dns is discontinued. But for the ones
>> still
>> using
>> it, in some cases it does not register all tasks ip addresses.
>>
>> The default[2] works, but if you have this setup[1] it will only
>> register one ip address 192.168.122.140 and not the 2nd. I filed
>> issue a
>> year ago or so[3]
>>
>>
>>
>> [3]
>> https://github.com/mesosphere/mesos-dns/issues/54145
>> https://issues.apache.org/jira/browse/MESOS-10164
>>
>> [1]
>> "network_infos": [
>>   {
>> "ip_addresses": [
>>   {
>> "protocol": "IPv4",
>> "ip_address": "192.168.122.140"
>>   }
>> ]
>>   },
>>   {
>> "ip_addresses": [
>>   {
>> "protocol": "IPv4",
>> "ip_address": "192.168.10.17"
>>   }
>> ],
>>   }
>> ]
>>
>>
>> [2]
>> "network_infos": [
>>   {
>> "ip_addresses": [
>>   {
>> "protocol": "IPv4",
>> "ip_address": "12.0.1.2"
>>   },
>>   {
>> "protocol": "IPv6",
>> "ip_address": "fd01:b::1:8000:2"
>>   }
>> ],
>>   }
>> ]
>>
>>
>>
>>
>>
>>


Re: Mesos syslog logging to error level instead of info?

2020-07-24 Thread Vinod Kone
I dont think I understand the problem here. Could you please elaborate more
on what you are expecting and what you are seeing? Also the flags and env
variables for the mesos-master process that you are using would be good to
include.

On Fri, Jul 24, 2020 at 5:30 AM Marc Roos  wrote:

>
>
> I have my test cluster of mesos on again, and I am having mesos-master
> logs end up in the wrong logs. I think mesos is not logging to correct
> levels/facility. (using mesos-1.10.0-2.0.1.el7.x86_64)
>
> Eg. I have got this on level error:
>
> Jul 24 12:25:16 m01 mesos-master[28922]: I0724 12:25:16.854624 28955
> master.cpp:8889] Performing explicit task state reconciliation for 1
> tasks of framework 43d5a67d-8c4e-496e-a108-5cfeb10b8967- (marathon)
> at scheduler-a9897343-98ee-4c31-a715-1b5e96e296bb@192.168.10.22:41009
> Jul 24 12:25:20 m01 mesos-master[28922]: I0724 12:25:20.557858 28957
> authorization.cpp:136] Authorizing principal 'ANY' to GET the endpoint
> '/metrics/snapshot'
> Jul 24 12:25:24 m01 mesos-master[28922]: I0724 12:25:24.738281 28957
> authorization.cpp:136] Authorizing principal 'ANY' to GET the endpoint
> '/metrics/snapshot'
> Jul 24 12:25:26 m01 mesos-master[28922]: I0724 12:25:26.547469 28958
> authorization.cpp:136] Authorizing principal 'ANY' to GET the endpoint
> '/metrics/snapshot'
> Jul 24 12:25:26 m01 mesos-master[28922]: I0724 12:25:26.554080 28961
> http.cpp:1436] HTTP GET for /master/state?jsonp=angular.callbacks._fmv
> from 192.168.10.219:49885 with User-Agent='Mozilla/5.0 (Windows NT 6.1;
> Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0'
> Jul 24 12:25:26 m01 mesos-master[28922]: I0724 12:25:26.556207 28956
> http.cpp:1453] HTTP GET for /master/state?jsonp=angular.callbacks._fmv
> from 192.168.10.219:49885: '200 OK' after 2.46784ms
> Jul 24 12:25:26 m01 mesos-master[28922]: I0724 12:25:26.582295 28955
> http.cpp:1436] HTTP GET for
> /master/maintenance/schedule?jsonp=angular.callbacks._fmw from
> 192.168.10.219:63372 with User-Agent='Mozilla/5.0 (Windows NT 6.1;
> Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0'
> Jul 24 12:25:30 m01 mesos-master[28922]: I0724 12:25:30.635844 28955
> authorization.cpp:136] Authorizing principal 'ANY' to GET the endpoint
> '/metrics/snapshot'
> Jul 24 12:25:31 m01 mesos-master[28922]: I0724 12:25:31.874604 28955
> master.cpp:8889] Performing explicit task state reconciliation for 1
> tasks of framework 43d5a67d-8c4e-496e-a108-5cfeb10b8967- (marathon)
> at scheduler-a9897343-98ee-4c31-a715-1b5e96e296bb@192.168.10.22:41009
> Jul 24 12:25:34 m01 mesos-master[28922]: I0724 12:25:34.816028 28958
> authorization.cpp:136] Authorizing principal 'ANY' to GET the endpoint
> '/metrics/snapshot'
> Jul 24 12:25:36 m01 mesos-master[28922]: I0724 12:25:36.625381 28955
> authorization.cpp:136] Authorizing principal 'ANY' to GET the endpoint
> '/metrics/snapshot'
> Jul 24 12:25:36 m01 mesos-master[28922]: I0724 12:25:36.632581 28956
> http.cpp:1436] HTTP GET for /master/state?jsonp=angular.callbacks._fn0
> from 192.168.10.219:49885 with User-Agent='Mozilla/5.0 (Windows NT 6.1;
> Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0'
> Jul 24 12:25:36 m01 mesos-master[28922]: I0724 12:25:36.634801 28959
> http.cpp:1453] HTTP GET for /master/state?jsonp=angular.callbacks._fn0
> from 192.168.10.219:49885: '200 OK' after 2.55488ms
> Jul 24 12:25:36 m01 mesos-master[28922]: I0724 12:25:36.687845 28958
> http.cpp:1436] HTTP GET for
> /master/maintenance/schedule?jsonp=angular.callbacks._fn1 from
> 192.168.10.219:63372 with User-Agent='Mozilla/5.0 (Windows NT 6.1;
> Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0'
>


Re: Subject: [VOTE] Release Apache Mesos 1.10.0 (rc1)

2020-05-26 Thread Vinod Kone
+1 (binding)

Thanks for looking into it. Lets fix this in a point release.

On Tue, May 26, 2020 at 8:31 AM Andrei Sekretenko 
wrote:

> Thanks for checking this!
>
> The first one (centos, non-SSL, gcc, autotools) seems to be a race between
> several instances of `javah` attempting to check for existence and create
> the output directory.
> I believe there were no related changes in 1.10.x compared to 1.9.x.
>
> The second one (ubuntu, SSL, clang, autotools) is somewhat tricky.
> The immediate cause of the failure seems to be an attempt to compile
> src/tests/http_tests.proto with a not yet built protoc.
> src/tests/http_tests.proto has been added in 1.10; there were no
> tests-only protobuf definitions in Mesos before that.
> However, I'm not quite getting how protobuf compilation in the automake
> build is supposed to work at all with a bundled protoc.
>
> When the bundled protobuf is used, I don't see any dependency on protoc
> injected into the pb.cc/pb.h targets in the generated Makefile.
> Neither do I see how src/Makefile.am is supposed to introduce this
> dependency.
> (See
> https://github.com/apache/mesos/blob/5a04a1693e4f1d51007c23728f1884a307e22src/testssrc/tests9a1/src/Makefile.am#L499
> <https://github.com/apache/mesos/blob/5a04a1693e4f1d51007c23728f1884a307e229a1/src/Makefile.am#L499>
> and below).
> Looks like all other protobufs (usually?) compile due to sheer luck.
>
> The workaround for the javah race and the fix for missing dependency on
> protoc seem to be rather straightforward.
> If any of these two should be considered a blocker for 1.10.0, please vote
> -1.
>
>
>
>
>
> On Tue, May 19, 2020 at 6:55 PM Vinod Kone  wrote:
>
>> Ran it in Apache CI. Found 2 build issues (issue 1
>> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/77/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--disable-libtool-wrappers%20--disable-parallel-test-execution,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop%7C%7Cbeam)&&(!ubuntu-us1)&&(!ubuntu-eu2)/console>,
>> issue 2
>> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/77/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=--verbose%20--disable-libtool-wrappers%20--disable-parallel-test-execution%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A16.04,label_exp=(docker%7C%7CHadoop%7C%7Cbeam)&&(!ubuntu-us1)&&(!ubuntu-eu2)/console>)
>> which seem to be related to race condition due to parallel build.
>>
>> @Andrei Sekretenko  Can you confirm this is
>> not a regression in the build system?
>>
>> *Revision*: 1fb36dcc5a0099f147cd01bd82cd7b4f0aec2256
>>
>>- refs/tags/1.10.0-rc1
>>
>> Configuration Matrix gcc clang
>> centos:7 --verbose --disable-libtool-wrappers
>> --disable-parallel-test-execution --enable-libevent --enable-ssl
>> autotools
>> [image: Success]
>> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/77/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--disable-libtool-wrappers%20--disable-parallel-test-execution%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop%7C%7Cbeam)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> [image: Not run]
>> cmake
>> [image: Success]
>> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/77/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose%20--disable-libtool-wrappers%20--disable-parallel-test-execution%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop%7C%7Cbeam)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> [image: Not run]
>> --verbose --disable-libtool-wrappers --disable-parallel-test-execution
>> autotools
>> [image: Failed]
>> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/77/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--disable-libtool-wrappers%20--disable-parallel-test-execution,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop%7C%7Cbeam)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> [image: Not run]
>> cmake
>> [image: Success]
>> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/77/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose%20--disable-libtool-wrappers%20--disable-parallel-test-execution,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop%7C%7Cbeam)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> [image: Not run]
>> ubuntu:16.04 --verbose --disable-libtool-wrappers
>> --disable-parallel-test-exec

Re: Subject: [VOTE] Release Apache Mesos 1.10.0 (rc1)

2020-05-19 Thread Vinod Kone
Ran it in Apache CI. Found 2 build issues (issue 1
,
issue 2
)
which seem to be related to race condition due to parallel build.

@Andrei Sekretenko  Can you confirm this is not
a regression in the build system?

*Revision*: 1fb36dcc5a0099f147cd01bd82cd7b4f0aec2256

   - refs/tags/1.10.0-rc1

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Failed]

cmake
[image: Success]

[image: Success]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Success]

Re: [VOTE] Release Apache Mesos 1.7.3 (rc1)

2020-05-06 Thread Vinod Kone
+1 (binding)

Tested on ASF CI. All builds passed!

*Revision*: 5f617044c969ebcfca281d043a2474c1a6b39f23

   - refs/tags/1.7.3-rc1

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]



On Mon, May 4, 2020 at 12:48 PM Greg Mann  wrote:

> Hi all,
>
> Please vote on releasing the following 

Re: Number of tasks per executor and resource limits

2020-02-20 Thread Vinod Kone
Andrei, Qian: Can one of you answer the above question?

On Thu, Feb 20, 2020 at 7:15 PM Charles-François Natali 
wrote:

> Thanks for the quick reply!
>
> I think we're going to go for one executor per task for now, that's much
> simpler.
>
> Otherwise I was wondering - if we wanted to support multiple tasks per
> executor, could we leverage the mesos containeriser to easily put each task
> in its own cgroup?
>
>
>
> Is it just a matter of setting 'execu
> On Thu, 20 Feb 2020, 17:40 Vinod Kone,  wrote:
>
>> Hi Charles,
>>
>> We are actually working on a new feature that puts each of the tasks (of
>> the default executor) in its own cgroup so that they can be individually
>> limited. See https://issues.apache.org/jira/browse/MESOS-9916 . For
>> custom executor, you would be on your own to implement the same. Also, your
>> custom executor / scheduler should be able to limit one task per executor
>> if you so desire.
>>
>> On Thu, Feb 20, 2020 at 6:34 PM Charles-François Natali <
>> cf.nat...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Is there a way to force Mesos to use one executor per task?
>>> The reason I would want to do that is for resources limits: for example,
>>> when using cgroup to limit CPU and memory, IIUC the containeriser sets
>>> limits corresponding to the sum of the resources allocated to the tasks
>>> managed by the underlying executor.
>>>
>>> Which means that for example if a task is allocated 1GB and another 2GB,
>>> if they are started by the same executor, the collarbone containeriser will
>>> limit the total memory for the two tasks (plus the executor) to 3GB. But,
>>> unless I'm missing something, nothing prevents a process (assuming one
>>> process per task with e.g. the command executor or a custom executor) from
>>> using more than its limit.
>>>
>>> Obviously it would be possible for the executor to enforce the
>>> individual limits itself using cgroup - does the command/default executor
>>> do that? - but in our case where we use a custom executor it would be quite
>>> painful.
>>>
>>> Unless I'm missing something?
>>>
>>> Thanks in advance for suggestions,
>>>
>>> Charles
>>>
>>


Re: Number of tasks per executor and resource limits

2020-02-20 Thread Vinod Kone
Hi Charles,

We are actually working on a new feature that puts each of the tasks (of
the default executor) in its own cgroup so that they can be individually
limited. See https://issues.apache.org/jira/browse/MESOS-9916 . For custom
executor, you would be on your own to implement the same. Also, your custom
executor / scheduler should be able to limit one task per executor if you
so desire.

On Thu, Feb 20, 2020 at 6:34 PM Charles-François Natali 
wrote:

> Hi,
>
> Is there a way to force Mesos to use one executor per task?
> The reason I would want to do that is for resources limits: for example,
> when using cgroup to limit CPU and memory, IIUC the containeriser sets
> limits corresponding to the sum of the resources allocated to the tasks
> managed by the underlying executor.
>
> Which means that for example if a task is allocated 1GB and another 2GB,
> if they are started by the same executor, the collarbone containeriser will
> limit the total memory for the two tasks (plus the executor) to 3GB. But,
> unless I'm missing something, nothing prevents a process (assuming one
> process per task with e.g. the command executor or a custom executor) from
> using more than its limit.
>
> Obviously it would be possible for the executor to enforce the individual
> limits itself using cgroup - does the command/default executor do that? -
> but in our case where we use a custom executor it would be quite painful.
>
> Unless I'm missing something?
>
> Thanks in advance for suggestions,
>
> Charles
>


Re: cni iptables best practice

2020-02-05 Thread Vinod Kone
Hi Marc,

CNI3 support is not on Mesosphere's near term roadmap given our other
priorities. But if there's anyone in the community willing to work with you
to develop it, as the Apache Mesos project, we'll be happy to accept the
contribution (of course assuming it adheres to the project's quality
standards).

On Wed, Feb 5, 2020 at 8:57 AM Marc Roos  wrote:

>
> Is this possible? I would like to start using mesos in production to be
> honest.
>
>
>
> -Original Message-
> Sent: 30 January 2020 18:46
> To: Qian Zhang
> Cc: user; supp...@mesosphere.com
> Subject: RE: cni iptables best practice
>
>
> What about when I fund this? How much would it cost? Otherwise I need to
> spend time/money on making a custom cni plugin that is not even
> operating via standards.
>
> PS. I do not see the point of getting some external programmer, that
> needs to acquire specific knowledge on this subject first.
>
>
>
> -Original Message-
> Cc: user
> Subject: Re: cni iptables best practice
>
> I do not think we plan to do it in short term.
>
>
> Regards,
> Qian Zhang
>
>
> On Tue, Jan 28, 2020 at 1:54 AM Marc Roos 
> wrote:
>
>
>
>  Hi Qian,
>
> Any idea on when this cni 0.3 is going to be implemented? I saw
> the
>
> issue priority is Major, can't remember if it was always like
> this.
> But
> looks promising.
>
> Regards,
> Marc
>
>
>
>
> -Original Message-
> Sent: 14 December 2019 09:46
> To: user
> Subject: RE: cni iptables best practice
>
>
> Yes, yes I know, disaster. I wondered how or even if people are
> using
> iptables with tasks. Even on internal environment it could be nice
> to
> use not?
>
>
>
>
>


Re: Mesos master stops sending offers

2019-12-25 Thread Vinod Kone
The suggested info would be needed to triage this. 

Thanks,
Vinod

> On Dec 24, 2019, at 11:32 PM, Harjinder Singh Mistry 
>  wrote:
> 
> 
> We have been encountering an intermittent issue where Chronos stops getting 
> resource offers from Mesos master and the scheduled jobs get stuck in 'Queued'
> state at Chronos.
> 
> The sequence of observed events is as follows:
> 1. Chronos jobs are not executed by Mesos and status of jobs on Chronos
>dashboard is ‘Queued’.
> 2. Mesos master dashboard no longer shows agents i.e. slaves.
> 3. Mesos master logs show that master has not been sending resource offers to
>framework i.e. Chronos. But master keeps getting update from slaves for old
>tasks.
> 4. Zookeeper and slaves are not down. They are working fine.
> 5. After restarting Zookeeper, the system starts working fine. Chronos jobs
>start getting executed.
> 
> Please suggest a solution if this problem is known.
> 
> Can you please help us with the steps/info required for investigation ? We 
> plan
> to collect following when the issue happens next time:
> 
> 1. Logs from Chronos, Mesos Master, Mesos Slaves and Zookeeper nodes.
> 2. Check Mesos UI: http://mesos-master:5050 and see if any agents are listed
>and note status of jobs.
> 3. Hit the endpoint http://mesos-master:5050/state and save its output.
> 4. Check if Mesos masters and Zookeeper nodes are reachable (i.e. ping) from 
>Mesos slaves.
> 5. From output of step 3, determine the leader in Mesos master and check if 
> is 
>sending offers: tail -f /var/log/mesos-log/mesos-master.INFO | grep -i 
> sending
> 
> Thanks,
> Harjinder
> 
> 
>> -
>> This email and any files transmitted with it are confidential and intended 
>> solely for the use of the individual or entity to whom they are addressed. 
>> If you have received this email in error, please notify the system manager. 
>> This message contains confidential information and is intended only for the 
>> individual named. If you are not the named addressee, you should not 
>> disseminate, distribute or copy this email. Please notify the sender 
>> immediately by email if you have received this email by mistake and delete 
>> this email from your system. If you are not the intended recipient, you are 
>> notified that disclosing, copying, distributing or taking any action in 
>> reliance on the contents of this information is strictly prohibited.
>>  
>> Any views or opinions presented in this email are solely those of the author 
>> and do not necessarily represent those of the organization. Any information 
>> on shares, debentures or similar instruments, recommended product pricing, 
>> valuations and the like are for information purposes only. It is not meant 
>> to be an instruction or recommendation, as the case may be, to buy or to 
>> sell securities, products, services nor an offer to buy or sell securities, 
>> products or services unless specifically stated to be so on behalf of the 
>> Flipkart group. Employees of the Flipkart group of companies are expressly 
>> required not to make defamatory statements and not to infringe or authorise 
>> any infringement of copyright or any other legal right by email 
>> communications. Any such communication is contrary to organizational policy 
>> and outside the scope of the employment of the individual concerned. The 
>> organization will not accept any liability in respect of such communication, 
>> and the employee responsible will be personally liable for any damages or 
>> other liability arising.
>>  
>> Our organization accepts no liability for the content of this email, or for 
>> the consequences of any actions taken on the basis of the information 
>> provided, unless that information is subsequently confirmed in writing. If 
>> you are not the intended recipient, you are notified that disclosing, 
>> copying, distributing or taking any action in reliance on the contents of 
>> this information is strictly prohibited.
>> -


Re: New Apache Mesos Fedora Package

2019-10-20 Thread Vinod Kone
This is awesome. Thanks for doing this Javi. And for the help Benjamin. 

Thanks,
Vinod

> On Oct 20, 2019, at 4:46 AM, Javi Roman  wrote:
> 
> Hi all!
> 
> I'd like to share with the Apache Mesos community the availability of
> the official Apache Mesos Package for Fedora 30 [1].
> 
> This new package is a little bit outdated: Apache Mesos 1.8.1 and
> Fedora 30 (Fedora is right now in 31th stable version), however this
> package is the beginning of support for Apache Mesos in the Fedora
> community. The update to the last Mesos version and the last Fedora
> releases is on going.
> 
> This package has three main issues/features:
> 
> 1. The Python 2 support is removed (so the old CLI is removed, the
> tools and framework binding based on Python 2 are removed).
> 2. The new CLI based on Python 3 is not included due the Apache
> release of 1.8.1 was released without this bits [2]
> 2. The NVML isolator support is disabled [3] because of Fedora license 
> polices.
> 
> The usage of this package is identical to the RPMs built by Apache
> Mesos community at Bintray.
> 
> I would like to express publicly my gratitude to Benjamin Bannier for
> his support in the construction of this package.
> 
> 
> [1] https://apps.fedoraproject.org/packages/mesos
> [2] https://issues.apache.org/jira/browse/MESOS-9958
> [3] https://issues.apache.org/jira/browse/MESOS-9978
> --
> Javi Roman
> 
> Twitter: @javiromanrh
> GitHub: github.com/javiroman
> Linkedin: es.linkedin.com/in/javiroman
> Big Data Blog: dataintensive.info
> Apache Id: javiroman


Re: [VOTE] Release Apache Mesos 1.9.0 (rc1)

2019-08-27 Thread Vinod Kone
I see. That's reduces the risk considerably than what I originally thought
but I guess still risky to introduce it so late?

On Tue, Aug 27, 2019 at 1:28 PM Benjamin Mahler  wrote:

> > We upgraded the version of the bundled boost very late in the release
> cycle
>
> Did we? We still bundle boost 1.65.0, just like we did during 1.8.x. We
> just adjusted our special stripped bundle to include additional headers.
>
> On Tue, Aug 27, 2019 at 1:39 PM Vinod Kone  wrote:
>
>> -1
>>
>> We upgraded the version of the bundled boost very late in the release
>> cycle
>> which doesn't give downstream customers (who also depend on boost) enough
>> time to vet any compatibility/perf/other issues. I propose we revert the
>> boost upgrade (and the corresponding code changes depending on the
>> upgrade)
>> in 1.9.x branch but keep it in the master branch.
>>
>> On Tue, Aug 27, 2019 at 4:18 AM Qian Zhang  wrote:
>>
>> > Hi all,
>> >
>> > Please vote on releasing the following candidate as Apache Mesos 1.9.0.
>> >
>> >
>> > 1.9.0 includes the following:
>> >
>> >
>> 
>> > * Agent draining
>> > * Support configurable /dev/shm and IPC namespace.
>> > * Containerizer debug endpoint.
>> > * Add `no-new-privileges` isolator.
>> > * Client side SSL certificate verification in Libprocess.
>> >
>> > The CHANGELOG for the release is available at:
>> >
>> >
>> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.9.0-rc1
>> >
>> >
>> 
>> >
>> > The candidate for Mesos 1.9.0 release is available at:
>> >
>> https://dist.apache.org/repos/dist/dev/mesos/1.9.0-rc1/mesos-1.9.0.tar.gz
>> >
>> > The tag to be voted on is 1.9.0-rc1:
>> > https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.9.0-rc1
>> >
>> > The SHA512 checksum of the tarball can be found at:
>> >
>> >
>> https://dist.apache.org/repos/dist/dev/mesos/1.9.0-rc1/mesos-1.9.0.tar.gz.sha512
>> >
>> > The signature of the tarball can be found at:
>> >
>> >
>> https://dist.apache.org/repos/dist/dev/mesos/1.9.0-rc1/mesos-1.9.0.tar.gz.asc
>> >
>> > The PGP key used to sign the release is here:
>> > https://dist.apache.org/repos/dist/release/mesos/KEYS
>> >
>> > The JAR is in a staging repository here:
>> > https://repository.apache.org/content/repositories/orgapachemesos-1255
>> >
>> > Please vote on releasing this package as Apache Mesos 1.9.0!
>> >
>> > The vote is open until Friday, April 30 and passes if a majority of at
>> > least 3 +1 PMC votes are cast.
>> >
>> > [ ] +1 Release this package as Apache Mesos 1.9.0
>> > [ ] -1 Do not release this package because ...
>> >
>> >
>> > Thanks,
>> > Qian and Gilbert
>> >
>>
>


Re: [VOTE] Release Apache Mesos 1.9.0 (rc1)

2019-08-27 Thread Vinod Kone
-1

We upgraded the version of the bundled boost very late in the release cycle
which doesn't give downstream customers (who also depend on boost) enough
time to vet any compatibility/perf/other issues. I propose we revert the
boost upgrade (and the corresponding code changes depending on the upgrade)
in 1.9.x branch but keep it in the master branch.

On Tue, Aug 27, 2019 at 4:18 AM Qian Zhang  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.9.0.
>
>
> 1.9.0 includes the following:
>
> 
> * Agent draining
> * Support configurable /dev/shm and IPC namespace.
> * Containerizer debug endpoint.
> * Add `no-new-privileges` isolator.
> * Client side SSL certificate verification in Libprocess.
>
> The CHANGELOG for the release is available at:
>
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.9.0-rc1
>
> 
>
> The candidate for Mesos 1.9.0 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/1.9.0-rc1/mesos-1.9.0.tar.gz
>
> The tag to be voted on is 1.9.0-rc1:
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.9.0-rc1
>
> The SHA512 checksum of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/1.9.0-rc1/mesos-1.9.0.tar.gz.sha512
>
> The signature of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/1.9.0-rc1/mesos-1.9.0.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1255
>
> Please vote on releasing this package as Apache Mesos 1.9.0!
>
> The vote is open until Friday, April 30 and passes if a majority of at
> least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 1.9.0
> [ ] -1 Do not release this package because ...
>
>
> Thanks,
> Qian and Gilbert
>


Re: Draining: Failed to validate master::Call: Expecting 'type' to be present

2019-08-07 Thread Vinod Kone
Please read the "maintenace primitives" section in this doc
http://mesos.apache.org/documentation/latest/maintenance/ and let us know
if you have unanswered questions.

On Wed, Aug 7, 2019 at 4:59 PM Marc Roos  wrote:

>
>  I seem to be able to add a maintenance schedule, and get also a report
> on '{"down_machines":[{"hostname":"m02.local"}]}' but I do not see tasks
> migrate to other hosts. Or is this not the purpose of maintenance mode
> in 1.8? Just to make sure no new tasks will be launched on hosts
> scheduled for maintenance?
>
>
>
> -Original Message-
> From: Chun-Hung Hsiao [mailto:chhs...@apache.org]
> Sent: woensdag 7 augustus 2019 22:59
> To: user
> Subject: Re: Draining: Failed to validate master::Call: Expecting 'type'
> to be present
>
> Hi Marc.
>
> Agent draining is a Mesos 1.9 feature and is only available on the
> current Mesos master branch.
> Please see https://issues.apache.org/jira/browse/MESOS-9814.
>
> Best,
> Chun-Hung
>
> On Wed, Aug 7, 2019 at 1:35 PM Marc Roos 
> wrote:
>
>
>
> Should this be working in mesos 1.8?
>
> [@m01 ~]# curl --user test:x -X POST \
> >   https://m01.local:5050/api/v1 \
> >   --cacert /etc/pki/ca-trust/source/ca.crt \
> >   -H 'Accept: application/json' \
> >   -H 'content-type: application/json' -d '{
> >   "type": "DRAIN_AGENT",
> >   "drain_agent": {"agent_id": {
> > "value":"53336fcb-7756-4673-b9c7-177e04f34c3b-S1"
> >   }}}'
>
> Failed to validate master::Call: Expecting 'type' to be present
>
>
>
>
>


Re: Advice on how to group nodes (maybe with fault domain)

2019-08-06 Thread Vinod Kone
Master should be able to make offers for agents in different zones. Not
sure what you are encountering?  Please read
http://mesos.apache.org/documentation/latest/fault-domains/ if you haven't
already.

On Tue, Aug 6, 2019 at 11:01 AM Marc Roos  wrote:

>
> I have a test environment of vm's and I added a bare metal server to it.
>
>
> I would like to be able to differentiate task launches on the vm's and
> the bare metal. I thought about configuring fault domains. But then I
> run into the problem that the one master I have currently is not able to
> service the zone of the vms and the zone of the bare metal server.
>
> Is it even possible to configure the master to service these two zone's?
> And how? Something like this?
>
> {"fault_domain":{"region":{"name": "test"},"zone":{"name": "*"}}}
>
> Or should I approach this differently?
>


Re: [VOTE] Release Apache Mesos 1.8.1 (rc1)

2019-07-10 Thread Vinod Kone
+1 (binding).

Tested in ASF CI. One build failed due to known flaky test
https://issues.apache.org/jira/browse/MESOS-9594


*Revision*: 4ae06448466408d9ec96ede953208057609f0744

   - refs/tags/1.8.1-rc1

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Failed]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]





On Wed, Jul 10, 2019 at 11:54 AM Benno Evers  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.8.1.
>
> We had a lot 

Re: How to Use 2 physical machine resource at the same time

2019-07-09 Thread Vinod Kone
Yes, If I understand your use case correctly.

You can also reach out to us in slack  if
you want a more synchronous conversation about this.

On Tue, Jul 9, 2019 at 10:19 AM Gokula Krishnan 
wrote:

> Dear All,
>
> Thank you so much for your response.
>
>
>
> I am not using Mesos but I am exploring if Mesos can be used for my
> requirement.
>
>
>
> *Current Setup/Environment*
>
> 2 physical machines each has RAM16gb, 1CPU(4core), Linux OS
>
> Both physical machine has same services running
>
> · apache httpd
>
> · 10+ web servers instances
>
> · (rdbms) database
>
> · rabbitmQ service
>
>
>
> At any point of time, only one physical machine is active (serves the
> request) and the other physical machine is in standby mode. All the
> requests are served but the active physical machine while the standby
> physical machine is unused.
>
> When the active physical machine goes down (fails), the standby machine
> become active and it servers the request.
>
>
>
> so at any point in time, only one physical machine is utilized.
>
>
>
> Using Mesos, is there a way to use both the physical machine resource at
> the same time.
>
>
>
> Thank you in advance,
>
> On Tue, Jul 9, 2019 at 1:29 AM Hans van den Bogert 
> wrote:
>
>> I think gokula isn't using mesos at all atm and is researching if there
>> are better options than his current failover environment.
>>
>> Under the above assumption:
>>
>> To answer gokula, yes mesos would allow you to use resources of multiple
>> machines, however I think the overhead of running  multiple mesos masters
>> (for failover like you have now) isn't worth it for two machines, though
>> that ultimately depends on the 'beefyness' of the hardware in question.
>>
>> It also depends on how you expect a mesos cluster to behave in comparison
>> to your current cluster. Can you elaborate on your current
>> setup/environment?
>>
>> Hans
>>
>
>
> --
> Thanks and Regards,
> Gokula
>


Re: How to Use 2 physical machine resource at the same time

2019-07-08 Thread Vinod Kone
Hi Gokula,

Not sure I follow what you are asking here. What do you mean by one node is
active and other passive at any point in time? Are you saying your
framework (marathon?) is launching all your web-servers on a single node
whereas you want them to be distributed evenly across 2 nodes? If yes, you
could look at using marathon constraints
 to balance
them better.

On Mon, Jul 8, 2019 at 11:29 AM Gokula Krishnan 
wrote:

> Dear All,
> Thanks in advance and need your inputs for my requirement.
>
> I have 2 nodes (physical machines)
> There 20+ web servers running
> 2 nodes are in active and passive mode
> Problem: At any point in time only one node is active and other node is
> passive.
>
> Solution: want to use the capacity of both the 2 nodes at the same time.
>
> Will Mesos dcos use the resource of 2 nodes at the same time ?
> How can i deploy webservers and use both the nodes resources.
>
> Thank you in advance
>
>
>
> --
> Thanks and Regards,
> Gokula
>
>


Re: Design doc: Agent draining and deprecation of maintenance primitives

2019-06-14 Thread Vinod Kone
+1

Thanks,
Vinod

> On Jun 14, 2019, at 9:18 AM, Greg Mann  wrote:
> 
> Hi all,
> Myself and a few other committers spent some time revisiting the possibility 
> of implementing agent draining using maintenance windows, as well as 
> discussing the coexistence of the existing maintenance primitives with the 
> agent draining feature as it is currently designed. Ultimately, the use case 
> of an operator putting an agent into a draining state immediately and 
> indefinitely, with no concept of a maintenance window, seems to be valid. 
> That use case is a bit awkward to represent in terms of our existing 
> maintenance windows. So, our thought is that we can add the agent draining 
> feature as it is currently designed, in order to provide an automatic agent 
> draining primitive. We can then later on extend the maintenance schedules to 
> allow operators to specify that they would like to automatically drain agents 
> leading up to the maintenance window. At that point, we could make use of the 
> agent draining primitive to accomplish this.
> 
> For the time being, we would like to disallow any single agent from both 
> being present in the maintenance schedule and being put into an automatic 
> draining state. This gives us some time to figure out precisely how these two 
> features will interact so that we avoid the need to make breaking changes 
> down the road.
> 
> Let me know what you all think of the above plan. I like it because it allows 
> operators who are currently using the maintenance primitives to continue 
> doing so, accommodates the simple case of immediate agent draining in the 
> near future, and allows us to incorporate automatic draining into the 
> maintenance schedule later.
> 
> Cheers,
> Greg
> 
>> On Fri, Jun 14, 2019 at 4:18 PM Greg Mann  wrote:
>> Christoph,
>> Great to hear that you're using the maintenance primitives! It seems unwise 
>> for us to deprecate this part of the API given the fact that you and Maxime 
>> have both expressed a desire for it to stick around. I'll adjust the agent 
>> draining design doc to remove the deprecation of that feature. Many thanks 
>> for your feedback.
>> 
>> Greg
>> 
>>> On Fri, Jun 7, 2019 at 9:24 PM Heer, Christoph  
>>> wrote:
>>> Hi everyone,
>>> 
>>> my team and I implemented our own Mesos framework for task execution on our 
>>> bare-metal on-prem cluster.
>>> Especially for task processing workload with known or estimated task 
>>> duration, the available Mesos maintenance primitives are super powerful for 
>>> scheduler and operators. While developing the scheduler, I hadn't the 
>>> feeling it would be complex to support/respect maintenance windows. Already 
>>> the small logic "Should I launch task X with estimated runtime 3h on node Y 
>>> with scheduled maintenance in 40min?" saved us tons of aborted tasks. Our 
>>> hardware operations team also really likes the way to plan and express 
>>> maintenance windows upfront. Days before the actually maintenance they can 
>>> add the information and the node will be ready at that point in time. Also, 
>>> they can reboot the machines without the fear that any production workload 
>>> will be scheduled until they confirmed the end of the maintenance. But 
>>> looks like this would be also ensured by the new design.
>>> 
>>> In the past we already used another job orchestration system with a 
>>> draining approach similar to the design proposal. In nearly all cases the 
>>> operations team didn't manage to start the draining mode at the right time. 
>>> Either it was too early, and we didn't use available hardware resources or 
>>> it was too late and it unnecessarily interrupted productive workload. 
>>> Especially for long-running tasks which are expensive at restarting, it 
>>> wasn't a good way to mange scheduled down times.
>>> 
>>> I don't know the implementation within Mesos and therefore I can't judge 
>>> about the complexity but I think the main problem is that Mesos doesn't 
>>> provide an intuitive interface for managing maintenance windows. The HTTP 
>>> API isn't that complicated but you definitely need own or external tooling. 
>>> Probably most people are already deterred from the JSON syntax with 
>>> nanoseconds. Also, the lack of synchronisation of modifications can be a 
>>> problem and makes it harder to implement tooling around the API. A new more 
>>> fine-grain HTTP API would be a big improvement and would allow to implement 
>>> a nice looking interface within the Mesos UI.
>>> 
>>> It would be sad to see this great feature disappearing.
>>> 
>>> Best regards,
>>> Christoph
>>> 
>>> 
>>> Christoph Heer
>>> SAP SE, Dietmar-Hopp-Allee 16, 69190 Walldorf, Germany
>>> 
>>> Mandatory Disclosure Statement: www.sap.com/impressum
>>> This e-mail may contain trade secrets or privileged, undisclosed, or 
>>> otherwise 
>>> confidential information. If you have received this e-mail in error, you 
>>> are hereby 
>>> notified that any review, copying, or distribution 

Re: '*.json' endpoints removed in 1.7

2019-05-10 Thread Vinod Kone
I propose that we revert this change and keep the ".json" endpoints in
master branch and 1.8.x

My reasoning is that, we have ecosystem components (e.g., mesos-dns which
is yet to have a release with fix) and anecdotally a bunch of custom
tooling at user sites that depend on these ".json" endpoints (esp.
/state.json). The amount of techdebt that we saved or consistency we
achieved in the codebase by doing this is not worth the tradeoff of
breaking some user/tooling, in my opinion. We could revisit this if and
when we do a Mesos 2.0.

On Wed, Aug 8, 2018 at 9:25 AM Alex Rukletsov  wrote:

> Folks,
>
> The long ago deprecated '*.json' endpoints will be removed in Mesos 1.7.0.
> Please use their non-'.json' counterparts instead.
>
> Commit:
> https://github.com/apache/mesos/commit/42551cb5290b7b04101f7d800b4b8fd573e47b91
> JIRA ticket: https://issues.apache.org/jira/browse/MESOS-4509
>
> Alex.
>


Re: [VOTE] Release Apache Mesos 1.8.0 (rc3)

2019-04-26 Thread Vinod Kone
+1 (binding)

1 failed build was due to a known flaky test.
*Revision*: acefa90695a32f8e8d6361f8192a6522aeaadbb9

   - refs/tags/1.8.0-rc3

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Not run]
cmake
[image: Failed]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]


On Fri, Apr 26, 2019 at 1:04 PM Benno Evers  wrote:

> Addendum:
> The vote is open until Thursday, May 2nd.
>
> On Fri, Apr 26, 2019 at 6:28 PM Benno Evers  wrote:
>
> > Hi all,
> >
> > Please vote on releasing the following 

Re: Slack upgrade to Standard plan. Thanks Criteo

2019-04-25 Thread Vinod Kone
Hi Marc,

Thanks for bringing this up. I think vendor lockin is a valid concern but
when we weighed that against the modern communication platform that slack
provides which will help engage with our community better, we chose the
latter. And I must say, the community engagement after Slack transition has
been better than what it was before during IRC.

On Tue, Apr 23, 2019 at 1:15 PM Marc Roos  wrote:

>
>
> I don't think it is good to get vendor locked-in. First and foremost
> data in slack is not publicly accessible.
>
>
>
>
> -Original Message-
> From: Jie Yu [mailto:yujie@gmail.com]
> Sent: 23 April 2019 19:51
> To: user
> Subject: Re: Slack upgrade to Standard plan. Thanks Criteo
>
> Thanks Criteo friends!
>
>
> On Tue, Apr 23, 2019 at 10:13 AM Vinod Kone 
> wrote:
>
>
> Hi folks,
>
> As you probably realized today, we got our Slack upgraded from
> "free" plan to "standard" plan, which allows us to have unlimited
> message history and better analytics among other things! This would be
> great for our community.
>
> This upgrade has been made possible due to a general
> contribution/donation from folks at Criteo
> <http://www.criteo.com/> . Criteo has been a long time user of Apache
> Mesos and luckily for us, they wanted to contribute back to the ecosystem.
> We will update the website with the thanks shortly.
>
> Hope you'll take advantage of the standard plan.
>
> Thanks,
>
> Vinod
>
>
>
>
>


Slack upgrade to Standard plan. Thanks Criteo

2019-04-23 Thread Vinod Kone
Hi folks,

As you probably realized today, we got our Slack upgraded from "free" plan
to "standard" plan, which allows us to have unlimited message history and
better analytics among other things! This would be great for our community.

This upgrade has been made possible due to a general contribution/donation
from folks at Criteo . Criteo has been a long time
user of Apache Mesos and luckily for us, they wanted to contribute back to
the ecosystem. We will update the website with the thanks shortly.

Hope you'll take advantage of the standard plan.

Thanks,
Vinod


Re: [VOTE] Release Apache Mesos 1.8.0 (rc2)

2019-04-18 Thread Vinod Kone
+1 (binding)

Ran on ASF CI.

*Revision*: f5920ad1a7cbcd2423c30465dcf14948e392081b

   - refs/tags/1.8.0-rc2

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]


On Thu, Apr 18, 2019 at 8:00 AM Benno Evers  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.8.0.
>
>
> 1.8.0 includes the following:
>
> 

Re: Subject: [VOTE] Release Apache Mesos 1.8.0 (rc1)

2019-04-15 Thread Vinod Kone
+1 (binding)

Ran it on ASF CI.

*Revision*: 85462fc183a60ae18d85729bccb1fffb59aa572c

   - refs/tags/1.8.0-rc1

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]


On Mon, Apr 15, 2019 at 1:26 PM Benno Evers  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.8.0.
>
>
> 1.8.0 includes the following:
>
> 

Re: [VOTE] Release Apache Mesos 1.5.3 (rc1)

2019-03-07 Thread Vinod Kone
+1 (binding)

Ran in ASF CI. Saw some flaky tests but otherwise looks good.

*Revision*: b1dbba03af23b0222d11f2b7ae936d77ef42650d

   - refs/tags/1.5.3-rc1

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Failed]


On Wed, Mar 6, 2019 at 7:33 AM Gilbert Song  wrote:

>  Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.5.3.
>
> 1.5.3 includes the following:
>
> 

Re: [VOTE] Release Apache Mesos 1.6.2 (rc1)

2019-02-20 Thread Vinod Kone
+1 (binding)

Thanks for the update Greg.

On Wed, Feb 20, 2019 at 11:41 AM Greg Mann  wrote:

> It appears to be a flaky test; that particular failure hasn't come up in
> the CI builds that I ran, or in my own manual testing. Just now, I was able
> to get that test to fail after many repetitions, but with a different
> error. I filed ticket MESOS-9589
> <https://issues.apache.org/jira/browse/MESOS-9589> to track.
>
> Cheers,
> Greg
>
> On Tue, Feb 19, 2019 at 2:41 PM Vinod Kone  wrote:
>
> > Found a flaky test
> > <
> >
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/65/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose%20--disable-libtool-wrappers%20--disable-parallel-test-execution,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:16.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/console
> > >in
> > ASF CI. Doesn't seem to be a known issue according to JIRA.
> >
> > @Greg Mann   can you please confirm if this is a
> flaky
> > test or something new?
> >
> >
> >
> > On Tue, Feb 19, 2019 at 1:56 PM Greg Mann  wrote:
> >
> > > Hi all,
> > >
> > > Please vote on releasing the following candidate as Apache Mesos 1.6.2.
> > >
> > >
> > > 1.6.2 includes a number of bug fixes since 1.6.1; the CHANGELOG for the
> > > release is available at:
> > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.6.2-rc1
> > >
> > >
> >
> 
> > >
> > > The candidate for Mesos 1.6.2 release is available at:
> > >
> >
> https://dist.apache.org/repos/dist/dev/mesos/1.6.2-rc1/mesos-1.6.2.tar.gz
> > >
> > > The tag to be voted on is 1.6.2-rc1:
> > > https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.6.2-rc1
> > >
> > > The SHA512 checksum of the tarball can be found at:
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/mesos/1.6.2-rc1/mesos-1.6.2.tar.gz.sha512
> > >
> > > The signature of the tarball can be found at:
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/mesos/1.6.2-rc1/mesos-1.6.2.tar.gz.asc
> > >
> > > The PGP key used to sign the release is here:
> > > https://dist.apache.org/repos/dist/release/mesos/KEYS
> > >
> > > The JAR is in a staging repository here:
> > > https://repository.apache.org/content/repositories/orgapachemesos-1246
> > >
> > > Please vote on releasing this package as Apache Mesos 1.6.2!
> > >
> > > The vote is open until Fri Feb 22 11:54 PST 2019, and passes if a
> > majority
> > > of at least 3 +1 PMC votes are cast.
> > >
> > > [ ] +1 Release this package as Apache Mesos 1.6.2
> > > [ ] -1 Do not release this package because ...
> > >
> > > Thanks,
> > > Greg
> > >
> >
>


Re: [VOTE] Release Apache Mesos 1.7.2 (rc1)

2019-02-20 Thread Vinod Kone
+1

Ran this on ASF CI.

The red builds are a flaky infra issue and a known flaky test
.

*Revision*: 58cc918e9acc2865bb07047d3d2dff156d1708b2

   - refs/tags/1.7.2-rc1

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Failed]



On Tue, Feb 19, 2019 at 5:00 PM Gastón Kleiman  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.7.2.
>
>

Re: [VOTE] Release Apache Mesos 1.6.2 (rc1)

2019-02-19 Thread Vinod Kone
Found a flaky test
in
ASF CI. Doesn't seem to be a known issue according to JIRA.

@Greg Mann   can you please confirm if this is a flaky
test or something new?



On Tue, Feb 19, 2019 at 1:56 PM Greg Mann  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.6.2.
>
>
> 1.6.2 includes a number of bug fixes since 1.6.1; the CHANGELOG for the
> release is available at:
>
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.6.2-rc1
>
> 
>
> The candidate for Mesos 1.6.2 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/1.6.2-rc1/mesos-1.6.2.tar.gz
>
> The tag to be voted on is 1.6.2-rc1:
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.6.2-rc1
>
> The SHA512 checksum of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/1.6.2-rc1/mesos-1.6.2.tar.gz.sha512
>
> The signature of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/1.6.2-rc1/mesos-1.6.2.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1246
>
> Please vote on releasing this package as Apache Mesos 1.6.2!
>
> The vote is open until Fri Feb 22 11:54 PST 2019, and passes if a majority
> of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 1.6.2
> [ ] -1 Do not release this package because ...
>
> Thanks,
> Greg
>


Re: [VOTE] Release Apache Mesos 1.4.3 (rc2)

2019-02-15 Thread Vinod Kone
+1 (binding)

Tested on ASF CI. Red builds are known flaky tests or unrelated infra
issues.


*Revision*: 1fee9b5365bf2424e4768dc1d5209c6c78dfece6

   - refs/tags/1.4.3-rc2

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Not run]
cmake
[image: Failed]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Failed]

[image: Success]

cmake
[image: Success]

[image: Failed]


On Wed, Feb 13, 2019 at 8:49 PM Meng Zhu  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.4.3.
>
> 1.4.3 includes the following:
>
> 

Re: Welcome Benno Evers as committer and PMC member!

2019-01-30 Thread Vinod Kone
Congratulations Benno!

On Wed, Jan 30, 2019 at 3:21 PM Alex R  wrote:

> Folks,
>
> Please welcome Benno Evers as an Apache committer and PMC member of the
> Apache Mesos!
>
> Benno has been active in the project for more than a year now and has made
> significant contributions, including:
>   * Agent reconfiguration, MESOS-1739
>   * Memory profiling, MESOS-7944
>   * "/state" performance improvements, MESOS-8345
>
> I have been working closely with Benno, paired up on, and shepherded some
> of his work. Benno has very strong technical knowledge in several areas and
> he is willing to share it with others and help his peers.
>
> Benno, thanks for all your contributions so far and looking forward to
> continuing to work with you on the project!
>
> Alex.
>


Re: [VOTE] Release Apache Mesos 1.4.3 (rc1)

2019-01-29 Thread Vinod Kone
+1

Tested in ASF CI. Red builds are known flakes.

*Revision*: fcfe1904e45726ca96fc6707d8b227a16664f4f8

   - refs/tags/1.4.3-rc1

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Failed]

[image: Failed]

cmake
[image: Success]

[image: Success]



On Mon, Jan 28, 2019 at 2:48 AM Alex Rukletsov  wrote:

> This will be the last official 1.4.x release. Even though we agreed to
> keep the branch and occasionally back port fixes to it post last release,
> maybe it makes sense to 

Re: [VOTE] Release Apache Mesos 1.7.1 (rc2)

2019-01-16 Thread Vinod Kone
+1 (binding)

Tested on ASF CI. Failing builds are due to missed SSL dep in the docker
build file and a flaky test.

*Revision*: d5678c3c5500cec72e22e775d9d048c55c128954

   - refs/tags/1.7.1-rc2

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Failed]

cmake
[image: Failed]

[image: Failed]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Failed]

cmake
[image: Success]

[image: Success]


On Tue, Jan 15, 2019 at 8:30 PM Chun-Hung Hsiao  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.7.1.
>
>
> 1.7.1 includes the 

Re: [VOTE] Release Apache Mesos 1.5.2 (rc3)

2019-01-16 Thread Vinod Kone
+1  (binding)

Passed in ASF CI. Known flaky tests, but otherwise builds look good.

*Revision*: 3088295d4156eb58d092ad9b3529b85fd33bd36e

   - refs/tags/1.5.2-rc3

Configuration Matrix gcc clang
centos:7 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]



On Wed, Jan 16, 2019 at 11:04 AM Jie Yu  wrote:

> +1
>
> make dist check on macOS Mojave
>
> On Tue, Jan 15, 2019 at 12:57 AM Gilbert Song  wrote:
>
>>  Hi all,
>>
>> Please vote on releasing the following candidate as Apache Mesos 1.5.2.
>>
>> 1.5.2 includes the following:
>>
>> 
>> *Announce major bug fixes here*
>> https://jira.apache.org/jira/issues/?filter=12345443
>>
>> The CHANGELOG for the release is available at:
>>
>> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.5.2-rc3
>>
>> 
>>
>> The candidate for Mesos 1.5.2 release is available at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.5.2-rc3/mesos-1.5.2.tar.gz
>>
>> The tag to be voted on is 1.5.2-rc3:
>> https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.5.2-rc3
>>
>> The SHA512 checksum of the tarball can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/mesos/1.5.2-rc3/mesos-1.5.2.tar.gz.sha512
>>
>> The signature of the tarball can be found at:
>>
>> 

Re: [Community WG] Reminder: Meeting today at 10:30 AM PST

2019-01-14 Thread Vinod Kone
Cloud recording for those who missed it:
https://zoom.us/recording/play/8z2oHhJZIkf0xnJZ40-NtzlNdwn9ev_FuGQnlYkbdp4AFqpHbfWXdO46Us3-MyNu?continueMode=true

On Mon, Jan 14, 2019 at 11:49 AM Vinod Kone  wrote:

> Hi folks,
>
> This is a reminder that we have community WG meeting today at 10:30 AM PST.
>
> The agenda for the meeting is here
> <https://docs.google.com/document/d/1vgi434dYkkZHs49EK4F4eMmM-3JG4f3qg-N5En-4ubg/edit#>.
> Please feel free to add more items to the agenda.
>
> See you there,
>
> Vinod
>


[DISCUSS] Updating the support and release policy

2019-01-14 Thread Vinod Kone
Hi folks,

As discussed in the Community WG meeting today, I wanted to send out a
proposal for updating the current support and release policy
.

Context: According to our release policy, the latest released version and
last 2 released versions are supported at any given time. With an expected
timeline of a minor release every 3 months, that means a minor release is
typically supported for 9 months. So far, we've indicated that a release is
unsupported by deleting the corresponding release branch in our repository.

The new proposal is as follows:

   - Keep the unsupported release branches and not delete them. Instead, we
   would make it clear in the CHANGELOG and also on the downloads
    page in our website which releases
   are supported and which are not.
   - If a committer would like to backport a fix to an unsupported release
   branch, they can do so. Such a backport is not required but a committer can
   do it if they wish. Contributor and committer should've a dialog regarding
   this.
   - CI will keep running against both supported and unsupported release
   branches  (as it is today) and any issues that might arise will be fixed on
   a best effort basis.
   - A committer can ask a contributor to submit a backport review incase
   the backport is complicated. Our review tooling (post-reviews and
   reviewbot) will be updated to make this possible.

Based on our experience with the current policy in the last couple of years
and the reality of how some of the organizations are using Mesos, we
believe this tweaks will make it more practical and useful.

Please let us know your thoughts by replying here or chatting in #community
in our slack channel.

Thanks,
Vinod (on behalf of Community WG)


[Community WG] Reminder: Meeting today at 10:30 AM PST

2019-01-14 Thread Vinod Kone
Hi folks,

This is a reminder that we have community WG meeting today at 10:30 AM PST.

The agenda for the meeting is here
.
Please feel free to add more items to the agenda.

See you there,

Vinod


Re: [VOTE] Release Apache Mesos 1.7.1 (rc1)

2019-01-02 Thread Vinod Kone
Also, another error
<https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/57/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/console>
.

/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:
In function 'tsi_result ssl_handshaker_extract_peer(tsi_handshaker*,
tsi_peer*)':
/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:1011:71:
error: 'SSL_get0_alpn_selected' was not declared in this scope
   SSL_get0_alpn_selected(impl->ssl, _selected, _selected_len);
   ^
/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:
In function 'tsi_result tsi_create_ssl_client_handshaker_factory(const
tsi_ssl_pem_key_cert_pair*, const char*, const char*, const char**,
uint16_t, tsi_ssl_client_handshaker_factory**)':
/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:1417:73:
error: 'SSL_CTX_set_alpn_protos' was not declared in this scope
   static_cast(impl->alpn_protocol_list_length))) {
 ^
/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:
In function 'tsi_result
tsi_create_ssl_server_handshaker_factory_ex(const
tsi_ssl_pem_key_cert_pair*, size_t, const char*,
tsi_client_certificate_request_type, const char*, const char**,
uint16_t, tsi_ssl_server_handshaker_factory**)':
/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:1557:79:
error: 'SSL_CTX_set_alpn_select_cb' was not declared in this scope

server_handshaker_factory_alpn_callback, impl);
   ^
make[7]: *** [CMakeFiles/grpc.dir/src/core/tsi/ssl_transport_security.cc.o]
Error 1
make[7]: Leaving directory
`/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0-build'
make[6]: *** [CMakeFiles/grpc.dir/all] Error 2
make[6]: Leaving directory
`/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0-build'
make[5]: *** [CMakeFiles/grpc.dir/rule] Error 2
make[5]: Leaving directory
`/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0-build'
make[4]: *** [grpc] Error 2
make[4]: Leaving directory
`/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0-build'
make[3]: *** [3rdparty/grpc-1.10.0/src/grpc-1.10.0-stamp/grpc-1.10.0-build]
Error 2
make[3]: Leaving directory `/mesos/build'
make[2]: *** [3rdparty/CMakeFiles/grpc-1.10.0.dir/all] Error 2
make[2]: *** Waiting for unfinished jobs



On Wed, Jan 2, 2019 at 3:35 PM Vinod Kone  wrote:

> I see an issue
> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/57/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/console>
> with clang compiler when running it in ASF CI. Is this a known issue?
>
> ../../src/resource_provider/storage/provider.cpp:3190:5: error: conditional 
> expression is ambiguous; 'Future>' can be converted 
> to 'Future>' and vice versa
> ? createVolume(
> ^ ~
>
>
>
> On Wed, Jan 2, 2019 at 2:11 PM Benjamin Mahler  wrote:
>
>> +1 (binding)
>>
>> make check passes on macOS 10.14.2
>>
>> $ clang++ --version
>> Apple LLVM version 10.0.0 (clang-1000.10.44.4)
>> Target: x86_64-apple-darwin18.2.0
>> Thread model: posix
>> InstalledDir: /Library/Developer/CommandLineTools/usr/bin
>>
>> $ ./configure CC=clang CXX=clang++ CXXFLAGS="-Wno-deprecated-declarations"
>> --disable-python --disable-java --with-apr=/usr/local/opt/apr/libexec
>> --with-svn=/usr/local/opt/subversion && make check -j12
>> ...
>> [  PASSED  ] 1956 tests.
>>
>> On Fri, Dec 21, 2018 at 5:48 PM Chun-Hung Hsiao 
>> wrote:
>>
>> > Hi all,
>> >
>> > Please vote on releasing the following candidate as Apache Mesos 1.7.1.
>> >
>> >
>> > 1.7.1 includes the following:
>> >
>> >
>> 
>> > * This is a bug fix release. Also includes performance and API
>> >   improvements:
>> >
>> >   * **Allocator**: Improved allocation cycle time substantially
>> > (see MESOS-9239 and MESOS-9249). These reduce the allocation
>> > cycle time in some benchmarks by 80%.
>> >
>> >   * **Scheduler API**: Improved the experimental `C

Re: [VOTE] Release Apache Mesos 1.7.1 (rc1)

2019-01-02 Thread Vinod Kone
I see an issue

with clang compiler when running it in ASF CI. Is this a known issue?

../../src/resource_provider/storage/provider.cpp:3190:5: error:
conditional expression is ambiguous; 'Future>'
can be converted to 'Future>' and vice versa
? createVolume(
^ ~



On Wed, Jan 2, 2019 at 2:11 PM Benjamin Mahler  wrote:

> +1 (binding)
>
> make check passes on macOS 10.14.2
>
> $ clang++ --version
> Apple LLVM version 10.0.0 (clang-1000.10.44.4)
> Target: x86_64-apple-darwin18.2.0
> Thread model: posix
> InstalledDir: /Library/Developer/CommandLineTools/usr/bin
>
> $ ./configure CC=clang CXX=clang++ CXXFLAGS="-Wno-deprecated-declarations"
> --disable-python --disable-java --with-apr=/usr/local/opt/apr/libexec
> --with-svn=/usr/local/opt/subversion && make check -j12
> ...
> [  PASSED  ] 1956 tests.
>
> On Fri, Dec 21, 2018 at 5:48 PM Chun-Hung Hsiao 
> wrote:
>
> > Hi all,
> >
> > Please vote on releasing the following candidate as Apache Mesos 1.7.1.
> >
> >
> > 1.7.1 includes the following:
> >
> >
> 
> > * This is a bug fix release. Also includes performance and API
> >   improvements:
> >
> >   * **Allocator**: Improved allocation cycle time substantially
> > (see MESOS-9239 and MESOS-9249). These reduce the allocation
> > cycle time in some benchmarks by 80%.
> >
> >   * **Scheduler API**: Improved the experimental `CREATE_DISK` and
> > `DESTROY_DISK` operations for CSI volume recovery (see MESOS-9275
> > and MESOS-9321). Storage local resource providers now return disk
> > resources with the `source.vendor` field set, so frameworks needs to
> > upgrade the `Resource` protobuf definitions.
> >
> >   * **Scheduler API**: Offer operation feedbacks now present their agent
> > IDs and resource provider IDs (see MESOS-9293).
> >
> >
> > The CHANGELOG for the release is available at:
> >
> >
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.7.1-rc1
> >
> >
> 
> >
> > The candidate for Mesos 1.7.1 release is available at:
> >
> https://dist.apache.org/repos/dist/dev/mesos/1.7.1-rc1/mesos-1.7.1.tar.gz
> >
> > The tag to be voted on is 1.7.1-rc1:
> > https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.7.1-rc1
> >
> > The SHA512 checksum of the tarball can be found at:
> >
> >
> https://dist.apache.org/repos/dist/dev/mesos/1.7.1-rc1/mesos-1.7.1.tar.gz.sha512
> >
> > The signature of the tarball can be found at:
> >
> >
> https://dist.apache.org/repos/dist/dev/mesos/1.7.1-rc1/mesos-1.7.1.tar.gz.asc
> >
> > The PGP key used to sign the release is here:
> > https://dist.apache.org/repos/dist/release/mesos/KEYS
> >
> > The JAR is in a staging repository here:
> >
> >
> https://repository.apache.org/content/repositories/releases/org/apache/mesos/mesos/1.7.1-rc1/
> >
> > Please vote on releasing this package as Apache Mesos 1.7.1!
> >
> > To accommodate for the holidays, the vote is open until Mon Dec 31
> > 14:00:00 PST 2018 and passes if a majority of at least 3 +1 PMC votes are
> > cast.
> >
> > [ ] +1 Release this package as Apache Mesos 1.7.1
> > [ ] -1 Do not release this package because ...
> >
> > Thanks,
> > Chun-Hung & Gaston
> >
>


Re: FW: full Zookeeper authentication

2018-12-06 Thread Vinod Kone
Dmitrii.

That approach sounds reasonable. Would you like to work on this? Are you
looking for a reviewer/shepherd?

On Thu, Dec 6, 2018 at 11:28 AM Kishchukov, Dmitrii (NIH/NLM/NCBI) [C] <
dmitrii.kishchu...@nih.gov> wrote:

> Mesos allow using only digest authentication scheme for Zookeeper. Which
> is bad because Zookeeper has quite a flexible security model.
> It is easy to make you own authenticator with its own scheme name.
>
> To support fully Zookeeper authentication, Mesos has pass two items into
> Zookeeper:
> scheme and credentials.
> credentials can have different format depending on authentication scheme.
> For digest scheme it is ‘login:password’
>
> All Mesos should do just pass scheme and credentials to Zookeeper.
>
> Another improvement might be be to configure credentials via file instead
> of URI
>
> For example it can be two command line options:
> --zk_auth_scheme and –zk_auth_credentials
>
> It can be used like this:
> --zk_auth_scheme=some_custome_scheme –zk_auth_credentials=filename
>
> --zk_auth_credentials can just get all contents of the file as credentials
> string.
>
> Class Authentication in Mesos already contains all that we need. The
> problem is what Mesos pass to the constructor.
>
>
> --
>
> Dmitrii Kishchukov.
>
>


Re: New scheduler API proposal: unsuppress and clear_filter

2018-12-03 Thread Vinod Kone
Thanks Meng for the explanation.

I imagine most frameworks do not remember what stuff they filtered much
less figure out how previously filtered stuff  can satisfy new operations.
That sounds complicated!

But I like your example. So a suggestion we could make to frameworks could
be to use CLEAR_FILTERS when they have new work, e.g., scale up/down, new
app (they might want to use this even if they aren't suppressed!); and to
use UNSUPPRESS when they are rescheduling old work?

Thoughts?

On Mon, Dec 3, 2018 at 6:51 PM Meng Zhu  wrote:

> Hi Vinod:
>
> Yeah, `CLEAR_FILTERS` sounds good.
>
> UNSUPPRESS should be used whenever currently suppressed framework wants to
> resume getting offers after a previous SUPPRESS call.
>
> As for `CLEAR_FILTERS`, the short (but not very useful) suggestion is to
> call it whenever the framework wants to clear all the existing filters.
>
> To elaborate it, frameworks decline and accumulate filters when it is
> trying to satisfy a particular set of requirements/constraints to perform
> an operation. Once the operation is done and the next operation comes, if
> the new operation has the same (or strictly more) resource
> requirements/constraints compared to the last one, then it is more
> efficient to KEEP the existing filters instead of getting useless offers
> and rebuild the filters again.
>
> On the other hand, if the requirements/constraints are different (i.e. some
> of the previous requirements could be loosened), then it means the existing
> filter no longer make sense. Then it might be a good idea to clear all the
> existing filters to improve the chance of getting more offers.
>
> Note, although we introduce `CLEAR_FILTERS` as part of decoupling the
> `REVIVE` call, its usage should be independent of suppression/revival. The
> decision to clear the filters only depends on whether the existing filters
> make sense for the current operation constraints/requirements.
>
> Examples:
> If a framework first launches a task, then wants to launch a replacement
> task (because the first task failed), then it should keep the filters built
> up during the first launch. However, if the framework wants to launch a
> second task with a completely different resource profile, then clearing
> filters might help to get more (otherwise filtered) offers and hence speed
> up the deployment.
>
> -Meng
>
> On Mon, Dec 3, 2018 at 12:40 PM Vinod Kone  wrote:
>
> > Hi Meng,
> >
> > What would be the recommendation for framework authors on when to use
> > UNSUPPRESS vs CLEAR_FILTER?
> >
> > Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?
> >
> > On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu  wrote:
> >
> >> Hi:
> >>
> >> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress and
> >> clear_filter in order to decouple the dual-semantics of the current
> revive
> >> call.
> >>
> >> As pointed out in the Mesos framework scalability guide
> >> <
> http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability
> >,
> >> utilizing the suppress
> >> <
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress>
> >> call is the key to get your cluster to a large number of frameworks
> >> <
> https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf
> >.
> >> In short, when a framework is idling with no intention to launch any
> tasks,
> >> it should suppress to inform the Mesos to stop sending any more offers.
> And
> >> the framework should revive
> >> <
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive>
> >> when new work arrives. This way, the allocator will skip the framework
> when
> >> performing resource allocations. As a result, thorny issues such as
> offer
> >> starvation and resource fragmentation would be greatly mitigated.
> >>
> >> That being said. The suppress/revive calls currently are a little bit
> >> unwieldy due to MESOS-9028
> >> <https://issues.apache.org/jira/browse/MESOS-9028>:
> >>
> >> The revive call has two semantics. It unsuppresses the framework AND
> >> clears all the existing filters. The later makes the revive call
> >> non-idempotent. And sometimes users may want to keep the existing
> filters
> >> when reiving which is not possible atm.
> >>
> >> To decouple the semantics, as suggested in the ticket, we propose to add
> >> two new V1 scheduler calls:
> >>
> >> (1) `UNSUPPRESS` call requests the Mesos to 

Re: New scheduler API proposal: unsuppress and clear_filter

2018-12-03 Thread Vinod Kone
Hi Meng,

What would be the recommendation for framework authors on when to use
UNSUPPRESS vs CLEAR_FILTER?

Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?

On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu  wrote:

> Hi:
>
> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress and
> clear_filter in order to decouple the dual-semantics of the current revive
> call.
>
> As pointed out in the Mesos framework scalability guide
> ,
> utilizing the suppress
> 
> call is the key to get your cluster to a large number of frameworks
> .
> In short, when a framework is idling with no intention to launch any tasks,
> it should suppress to inform the Mesos to stop sending any more offers. And
> the framework should revive
> 
> when new work arrives. This way, the allocator will skip the framework when
> performing resource allocations. As a result, thorny issues such as offer
> starvation and resource fragmentation would be greatly mitigated.
>
> That being said. The suppress/revive calls currently are a little bit
> unwieldy due to MESOS-9028
> :
>
> The revive call has two semantics. It unsuppresses the framework AND
> clears all the existing filters. The later makes the revive call
> non-idempotent. And sometimes users may want to keep the existing filters
> when reiving which is not possible atm.
>
> To decouple the semantics, as suggested in the ticket, we propose to add
> two new V1 scheduler calls:
>
> (1) `UNSUPPRESS` call requests the Mesos to resume sending offers;
> (2) `CLEAR_FILTER` call will explicitly clear all the existing filters.
>
> To make life easier, both calls will return 200 OK (as opposed to 202
> returned by most existing scheduler calls, including `SUPPRESS` and
> `REVIVE`).
>
> We will keep the revive call and its semantics (i.e. unsupppress AND clear
> filters) for backward compatibility.
>
> Note, the changes are proposed for V1 API only. Thus, once the changes are
> landed, framework developers are encouraged to move to V1 API to take
> advantage of the new calls (among many other benefits).
>
> Any feedback/comments are welcome.
>
> -Meng
>


Re: Sidecar for dynamic resource allocation in Mesos

2018-11-29 Thread Vinod Kone
This is great to see. Thanks for sharing!

On Thu, Nov 29, 2018 at 1:59 PM Iwanowski, Maciej <
maciej.iwanow...@intel.com> wrote:

> Hello,
>
> My team at Intel has recently opensourced a Mesos-related project:
> Orchestration-aware Workload Collocation Agent (
> https://github.com/intel/owca). Goal of the project is to: "dynamically
> manage platform isolation mechanisms to ensure that high priority jobs meet
> their service level objective (SLO) and best-effort jobs effectively
> utilize as many idle resources as possible" (
> https://github.com/intel/owca/blob/master/docs/OWCA_Architecture_v1.5.pdf).
> As we are aiming at integrating other orchestration platform we decided to
> build a sidecar rather than a Mesos module. Library is written in Python
> because of ubiquity of the language in Machine Learning and Big Data
> environments - we are planning to experiment with various resource
> allocation and anomaly detection algorithms.
>
> I hope that you will find the idea interesting or even, one day,
> deployable 
>
> Maciej
> an engineer
>
> 
>
> Intel Technology Poland sp. z o.o.
> ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII
> Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP
> 957-07-52-316 | Kapital zakladowy 200.000 PLN.
>
> Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego
> adresata i moze zawierac informacje poufne. W razie przypadkowego
> otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej
> usuniecie; jakiekolwiek
> przegladanie lub rozpowszechnianie jest zabronione.
> This e-mail and any attachments may contain confidential material for the
> sole use of the intended recipient(s). If you are not the intended
> recipient, please contact the sender and delete all copies; any review or
> distribution by
> others is strictly prohibited.
>


Re: Propose to create a Kubernetes framework for Mesos

2018-11-28 Thread Vinod Kone
Cameron and Michal: I would love to understand your motivations and use
cases for a k8s Mesos framework in a bit more detail. Looks like you are
willing to rewrite your existing app definitions into k8s API spec. At this
point, why are you still interested in Mesos as a CAAS backend? Is it
because of scalability / reliability? Or is it because you still want to
run non-k8s workloads/frameworks in this world? What are these workloads?

In general, I'm in favor of Mesos coming shipped with a default scheduler.
I think it might help with the adoption similar to what happened with the
command/default executor. In hindsight, we should've done this a long time
ago. But, oh well, we were too optimistic that a single "default" scheduler
will rule in the ecosystem which didn't quite pan out.

However, I'm not sure if re-implementing k8s-scheduler as a Mesos framework
is the right approach. I imagine k8s scheduler is significant piece of
code  which we need to re-implement and on top of it as new API objects are
added to k8s API, we need to keep pace with k8s scheduler for parity. The
approach we (in the community) took with Spark (and Jenkins to some extent)
was for the scheduling innovation happen in Spark community and we just let
Spark launch spark executors via Mesos and let Spark launch its tasks out
of band of Mesos. We used to have a version of Spark framework (fine
grained mode?) where spark tasks were launched via Mesos offers but that
was deprecated, partly because of maintainability. Will this k8s framework
have similar problem? Sounds like one of the problems with the existing k8s
framework implementations it the pre-launching of kubelets; can we use the
k8s autoscaler to solve that problem?

Also, I think (I might be wrong) most k8s users are not directly creating
pods via the API but rather using higher level abstractions like replica
sets, stateful sets, daemon sets etc. How will that fit into this
architecture? Will the framework need to re-implement those controllers as
well?

Is there an integration point in k8s ecosystem where we can reuse the
existing k8s schedulers and controllers but run the pods with mesos
container runtime?

All, in all, I'm +1 to explore the ideas in a WG.


On Wed, Nov 28, 2018 at 2:05 PM Paulo Pires  wrote:

> Hello all,
>
> As a Kubernetes fan, I am excited about this proposal.
> However, I would challenge this community to think more abstractly about
> the problem you want to address and any solution requirements before
> discussing implementation details, such as adopting VK.
>
> Don't take me wrong, VK is a great concept: a Kubernetes node that
> delegates container management to someone else.
> But allow me to clarify a few things about it:
>
> - VK simply provides a very limited subset of the kubelet functionality,
> namely the Kubernetes node registration and the observation of Pods that
> have been assigned to it. It doesn't do pod (intra or inter) networking nor
> delegates to CNI, doesn't do volume mounting, and so on.
> - Like the kubelet, VK doesn't implement scheduling. It also doesn't
> understand anything else than a Pod and its dependencies (e.g. ConfigMap or
> Secret), meaning other primitives, such as DaemonSet, Deployment,
> StatefulSet, or extensions, such as CRDs are unknown to the VK.
> - While the kubelet manages containers through CRI API (Container Runtime
> Interface), the VK does it through its own Provider API.
> - kubelet translates from Kubernetes primitives to CRI primitives, so CRI
> implementations only need to understand CRI. However, the VK does no
> translation and passes Kubernetes primitives directly to a provider,
> requiring the VK provider to understand Kubernetes primitives.
> - kubelet talks to CRI implementations through a gRPC socket. VK talks to
> providers in-process and is highly-opinionated about the fact a provider
> has no lifecycle (there's no _start_ or _stop_, as there would be for a
> framework). There are talks about having Provide API over gRPC but it's not
> trivial to decide[2].
>
> Now, if you are still thinking about implementation details, and having
> some experience trying to create a VK provider for Mesos[1], I can tell you
> the VK, as is today, is not a seamless fit.
> That said, I am willing to help you figure out the design and pick the
> right pieces to execute, if this is indeed something you want to do.
>
> 1 -
> https://github.com/pires/virtual-kubelet/tree/mesos_integration/providers/mesos
> 2 - https://github.com/virtual-kubelet/virtual-kubelet/issues/160
>
> Cheers,
> Pires
>
> On Wed, Nov 28, 2018 at 5:38 AM Jie Yu  wrote:
>
>> + user list as well to hear more feedback from Mesos users.
>>
>> I am +1 on this proposal to create a Mesos framework that exposes k8s
>> API, and provide nodeless
>> 
>> experience to users.
>>
>> Creating Mesos framework that provides k8s API is not a new idea. For
>> instance, the 

Re: Rhythm - time-based job scheduler

2018-11-02 Thread Vinod Kone
Great to see that you are using the v1 API as well.

Would you like to be added to
https://github.com/apache/mesos/blob/master/docs/api-client-libraries.md
for more visibility? If yes, please do send a PR.

On Fri, Nov 2, 2018 at 2:23 PM Benjamin Mahler  wrote:

> Thanks for sharing Michał! could you tell us how you (or your employer)
> are using it?
>
> On Tue, Oct 30, 2018 at 10:34 AM Michał Łowicki 
> wrote:
>
>> Hey!
>>
>> I would like to announce project I've been working on recently -
>> https://github.com/mlowicki/rhythm. It's a Cron-like scheduler with
>> couple of additional features:
>> * ACLs backend by either LDAP or GitLab
>> * Integration with Vault by HashiCorp for secrets management
>> * Support for both Docker and Mesos containerizers
>>
>> Feature requests / ideas / bug reports are more than welcome o/
>>
>> --
>> BR,
>> Michał Łowicki
>>
>


Re: Welcome Meng Zhu as PMC member and committer!

2018-10-31 Thread Vinod Kone
Congrats Meng!

Thanks,
Vinod

> On Oct 31, 2018, at 4:26 PM, Gilbert Song  wrote:
> 
> Well deserved, Meng!
> 
>> On Wed, Oct 31, 2018 at 2:36 PM Benjamin Mahler  wrote:
>> Please join me in welcoming Meng Zhu as a PMC member and committer!
>> 
>> Meng has been active in the project for almost a year and has been very 
>> productive and collaborative. He is now one of the few people of understands 
>> the allocator code well, as well as the roadmap for this area of the 
>> project. He has also found and fixed bugs, and helped users in slack.
>> 
>> Thanks for all your work so far Meng, I'm looking forward to more of your 
>> contributions in the project.
>> 
>> Ben


Re: Propose to run debug container as the same user of its parent container by default

2018-10-25 Thread Vinod Kone
Sounds good to me.

If I understand correctly, you want to treat this is a bug and backport it
to previous release branches? So, you are also asking whether backporting
this bug will be considered a breaking change for any existing users?

On Thu, Oct 25, 2018 at 11:46 AM James Peach  wrote:

>
>
> On Oct 23, 2018, at 7:47 PM, Qian Zhang  wrote:
>
> Hi all,
>
> Currently when launching a debug container (e.g., via `dcos task exec` or
> command health check) to debug a task, by default Mesos agent will use the
> executor's user as the debug container's user. There are actually 2 cases:
> 1. Command task: Since the command executor's user is same with command
> task's user, so the debug container will be launched as the same user of
> the command task.
> 2. The task in a task group: The default executor's user is same with the
> framework user, so in this case the debug container will be launched as the
> same user of the framework rather than the task.
>
> Basically I think the behavior of case 1 is correct. For case 2, we may
> run into a situation that the task is run as a user (e.g., root), but the
> debug container used to debug that task is run as another user (e.g., a
> normal user, suppose framework is run as a normal user), this may not be
> what user expects.
>
> So I created MESOS-9332  and
> propose to run debug container as the same user of its parent container
> (i.e., the task to be debugged) by default. Please let me know if you have
> any comments, thanks!
>
>
> This sounds like a sensible default to me. I can imagine for debug use
> cases you might want to run the debug container as root or give it elevated
> capabilities, but that should not be the default.
>
> J
>


Re: [VOTE] Release Apache Mesos 1.5.2 (rc1)

2018-10-24 Thread Vinod Kone
-1

Tested on ASF CI. Looks like Clang builds are failing with a build error.
See example build output

below:

libtool: compile:  clang++-3.5 -DPACKAGE_NAME=\"mesos\"
-DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"1.5.2\"
"-DPACKAGE_STRING=\"mesos 1.5.2\"" -DPACKAGE_BUGREPORT=\"\"
-DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"1.5.2\"
-DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1
-DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1
-DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1
-DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\"
-DHAVE_CXX11=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1
-DHAVE_FTS_H=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1
-DMESOS_HAS_JAVA=1 -DHAVE_EVENT2_EVENT_H=1 -DHAVE_LIBEVENT=1
-DHAVE_EVENT2_THREAD_H=1 -DHAVE_LIBEVENT_PTHREADS=1 -DHAVE_LIBSASL2=1
-DHAVE_OPENSSL_SSL_H=1 -DHAVE_EVENT2_BUFFEREVENT_SSL_H=1
-DHAVE_LIBEVENT_OPENSSL=1 -DUSE_SSL_SOCKET=1 -DHAVE_SVN_VERSION_H=1
-DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1
-DHAVE_ZLIB_H=1 -DHAVE_LIBZ=1 -DHAVE_PYTHON=\"2.7\"
-DMESOS_HAS_PYTHON=1 -I. -I../../src -Werror
-DLIBDIR=\"/mesos/mesos-1.5.2/_inst/lib\"
-DPKGLIBEXECDIR=\"/mesos/mesos-1.5.2/_inst/libexec/mesos\"
-DPKGDATADIR=\"/mesos/mesos-1.5.2/_inst/share/mesos\"
-DPKGMODULEDIR=\"/mesos/mesos-1.5.2/_inst/lib/mesos/modules\"
-I../../include -I../include -I../include/mesos -DPICOJSON_USE_INT64
-D__STDC_FORMAT_MACROS -isystem ../3rdparty/boost-1.53.0 -isystem
../3rdparty/concurrentqueue-7b69a8f -I../3rdparty/elfio-3.2
-I../3rdparty/glog-0.3.3/src -I../3rdparty/leveldb-1.19/include
-I../../3rdparty/libprocess/include -I../3rdparty/nvml-352.79
-I../3rdparty/picojson-1.3.0 -I../3rdparty/protobuf-3.5.0/src
-I../../3rdparty/stout/include
-I../3rdparty/zookeeper-3.4.8/src/c/include
-I../3rdparty/zookeeper-3.4.8/src/c/generated -isystem
/usr/include/subversion-1 -isystem /usr/include/apr-1 -isystem
/usr/include/apr-1.0 -pthread -Wall -Wsign-compare -Wformat-security
-fstack-protector-strong -fPIC -g1 -O0 -std=c++11 -MT
slave/containerizer/libmesos_no_3rdparty_la-containerizer.lo -MD -MP
-MF slave/containerizer/.deps/libmesos_no_3rdparty_la-containerizer.Tpo
-c ../../src/slave/containerizer/containerizer.cpp  -fPIC -DPIC -o
slave/containerizer/.libs/libmesos_no_3rdparty_la-containerizer.o
In file included from ../../src/slave/http.cpp:30:
In file included from ../../include/mesos/authorizer/authorizer.hpp:25:
../../3rdparty/libprocess/include/process/future.hpp:1089:3: error: no
matching member function for call to 'set'
  set(u);
  ^~~
../../src/slave/http.cpp:3196:10: note: in instantiation of function
template specialization
'process::Future::Future > >' requested here
  return slave->containerizer->attach(containerId)
 ^
../../3rdparty/libprocess/include/process/future.hpp:597:8: note:
candidate function not viable: no known conversion from 'const
process::Future >' to
'const process::http::Response' for 1st argument
  bool set(const T& _t);
   ^
../../3rdparty/libprocess/include/process/future.hpp:598:8: note:
candidate function not viable: no known conversion from 'const
process::Future >' to
'process::http::Response' for 1st argument
  bool set(T&& _t);
   ^







On Mon, Oct 22, 2018 at 12:53 AM Gilbert Song  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.5.2.
>
> 1.5.2 includes the following:
>
> 
>   * [MESOS-3790] - ZooKeeper connection should retry on `EAI_NONAME`.
>   * [MESOS-8128] - Make os::pipe file descriptors O_CLOEXEC.
>   * [MESOS-8418] - mesos-agent high cpu usage because of numerous
> /proc/mounts reads.
>   * [MESOS-8545] -
> AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.
>   * [MESOS-8568] - Command checks should always call
> `WAIT_NESTED_CONTAINER` before `REMOVE_NESTED_CONTAINER`.
>   * [MESOS-8620] - Containers stuck in FETCHING possibly due to
> unresponsive server.
>   * [MESOS-8830] - Agent gc on old slave sandboxes could empty persistent
> volume data.
>   * [MESOS-8871] - Agent may fail to recover if the agent dies before
> image store cache checkpointed.
>   * [MESOS-8904] - Master crash when removing quota.
>   * [MESOS-8906] - `UriDiskProfileAdaptor` fails to update profile
> selectors.
>   * [MESOS-8917] - Agent leaking file descriptors into forked processes.
>   * [MESOS-8921] - Autotools don't work with newer OpenJDK versions.
>   * [MESOS-8935] - Quota limit "chopping" can lead to cpu-only and
> memory-only offers.
>   * [MESOS-8936] - Implement a Random Sorter for offer allocations.
>   * [MESOS-8942] - Master streaming API does not send 

Re: Request for Comments - Health Check API Proposal

2018-10-17 Thread Vinod Kone
One of the things we discussed when we added `CheckInfo` and
`CheckStatusInfo` was to make the older `HealthCheck` and `bool healthy`
field (inside `TaskStatus`) consistent with the new `Check` format.

IIRC, some of the changes we wanted to do were

   - Deprecate `HealthCheck` and introduce a new `HealthCheckInfo` proto
   - The nested messages inside `HealthCheck` (e.g., `HTTPCheckInfo`)
   should be named differently in `HealthCheckInfo` (e.g., `Http`)
   - Deprecate `bool healthy` in TaskStatusInfo and introduce a new
   `HealthCheckStatusInfo` which looks similar to `CheckStatusInfo`

Right now, the proposal seems to only address the last point without
addressing the first two, which feels weird to me. I would prefer to see
them addressed in one shot.

Additionally, the proposed `HealthCheckStatusInfo` proto looks completely
different from `CheckStatusInfo`. Is that intentional? I hope we are not
thinking of deprecating it again when we come around to fix `HealthCheck`
proto to be consistent with `CheckInfo` ?

Thanks,

On Wed, Oct 17, 2018 at 1:26 PM Greg Mann  wrote:

> Hi all,
> Some users have recently reported issues with our current implementation
> of health checks. See this ticket
>  for an introduction to
> the issue.
>
> To summarize: we currently use a single 'optional bool healthy' field
> within the 'TaskStatus' message to indicate the result of a health check.
> This allows us to expose 3 health states to users:
> 1) 'healthy' field is unset = no health check specified, or health check
> failed but grace period has not yet elapsed, or health check has not yet
> been attempted
> 2) 'healthy' field is set to 'false' = a health check is specified and it
> returned 'false'
> 3) 'healthy' field is set to 'true' = a health check is specified and it
> returned 'true'
>
> The issue is that some users need to distinguish between the three
> scenarios in #1: no health check is specified, OR the task is not yet
> healthy but we are in the grace period. An example use case would be a load
> balancer which needs to wait for a healthy status to route traffic, but
> which immediately routes traffic to tasks which have no health check
> defined.
>
> This issue was recognized during the design of Mesos generalized checks;
> for those checks, we use the presence of the 'check_status' field to
> indicate whether or not a check is defined for the task. While consumers
> could make use of generalized checks as a workaround, this does not allow
> them to both detect the presence of a check AND achieve the task-killing
> behavior that health checks provide.
>
> In order to address this, I would like to propose the following new
> message, and an addition to the 'TaskStatus' message:
>
> message HealthCheckStatusInfo {
>   enum Status {
> UNKNOWN = 0;
> HEALTHY = 1;
> UNHEALTHY = 2;
>   }
>
>   required Status status = 0;
> }
>
> message TaskStatus {
>   . . .
>
>   optional HealthCheckStatusInfo health_check_status = 17;
>
>   . . .
> }
>
> The semantics of these fields would be as follows:
>
> 'health_status' field:
> - If set, a health check has been set
> - If unset, a health check has not been set
>
> 'health_status.status' field:
> - UNKNOWN: The task has not become healthy but is still within its grace
> period (this state is also used if an internal error prevents us from
> running the health check successfully)
> - HEALTHY: The health check indicates the task is healthy
> - UNHEALTHY: The health check indicates the task is not healthy
>
> This change would also involve deprecating the existing 'healthy' field.
> In accordance with our deprecation policy, I believe we could not remove
> the deprecated field until we have a new major release (2.x).
>
> I'd love to hear feedback on this proposal, thanks in advance! I'll also
> add this as an agenda item to our upcoming API working group meeting on
> Tuesday, Oct. 16 at 11am PST.
>
> Cheers,
> Greg
>


Re: [dcos] Vote now for MesosCon 2018 proposals!

2018-09-20 Thread Vinod Kone
Voted!

I see some really good proposals in there. Really looking forward to the
final program!

On Thu, Sep 20, 2018 at 11:51 AM Jörg Schad  wrote:

> Dear Mesos Community,
>
> Please take a few minutes over the next few days and review what members
> of the community have submitted for MesosCon 2018
>  (which will be held in San Francisco between
> November 5th-7th)!
> To make voting easier, we structured the voting following the different
> tracks.
> Please visit the following links and submit your responses. Look through
> as few or as many talks as you'd like to, and give us your feedback on
> these talks.
>
> Core Track: https://www.surveymonkey.com/r/mesoscon18-core
> Ecosystem Track: https://www.surveymonkey.com/r/mesoscon18-ecosystem
> DC/OS Track: https://www.surveymonkey.com/r/mesoscon18-dcos
> Frameworks Track: https://www.surveymonkey.com/r/mesoscon18-frameworks
> Operations Tracks: https://www.surveymonkey.com/r/mesoscon18-operations
> Misc Track: https://www.surveymonkey.com/r/mesoscon18-misc
> User Track: https://www.surveymonkey.com/r/mesoscon18-users
>
> Please submit your votes until Wednesday, Sept 26th 11:59 PM PDT, so you
> have one week to vote and make your voice heard!
>
> Thank you for your help and looking forward to a great MesosCon!
> Your MesosCon PC
>
> --
> You received this message because you are subscribed to the Google Groups
> "users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to users+unsubscr...@dcos.io.
> To post to this group, send email to us...@dcos.io.
> To view this discussion on the web visit
> https://groups.google.com/a/dcos.io/d/msgid/users/CALPK6M5jiT8jwm-GGrx9zV5ih17EfreGbZG5zrTfpB%3Dz13OoMA%40mail.gmail.com
> 
> .
>


Re: [VOTE] Release Apache Mesos 1.7.0 (rc2)

2018-08-29 Thread Vinod Kone
I prefer 1) since you already have the fix. 

Thanks,
Vinod

> On Aug 29, 2018, at 8:44 PM, Chun-Hung Hsiao  wrote:
> 
> I found two issues when compiling with clang 3.5:
> 
> 1. The `-Wno-inconsistent-missing-override` option added in 
> https://reviews.apache.org/r/67953/
> is not recognized by clang 3.5.
> 2. The same issue described in https://reviews.apache.org/r/55400/ would make
> `src/resource_provider/storage/provider.cpp` fail to compile.
> 
> I put up two patches to resolve the above issues (no review posted yet):
> https://github.com/chhsia0/mesos/commit/1f60aa3b3a7eede4a2a5ddf1288efff6a801ea97
> https://github.com/chhsia0/mesos/commit/84d13a0468f34726e4a920915cdda7e0e0a829b8
> 
> However, I'm not sure if this is worth blocking a release. We have 2 options:
> 1. Fail this vote and cut rc3 with the above patches to support clang 3.5.
> 2. Keep rc2 but bump the version requirement for clang on the website. (If 
> so, then the above patches are not needed.)
> 
> I was wondering which option would be more appropriate so I'd like to ask for 
> some feedbacks. Thanks!
> 
>> On Wed, Aug 29, 2018 at 10:18 AM James Peach  wrote:
>> +1 (binding)
>> 
>> Built and tested on Fedora 28 (clang).
>> 
>>> On Aug 24, 2018, at 4:42 PM, Chun-Hung Hsiao  wrote:
>>> 
>>> Hi all,
>>> 
>>> Please vote on releasing the following candidate as Apache Mesos 1.7.0.
>>> 
>>> 
>>> 1.7.0 includes the following:
>>> 
>>> * Performance Improvements:
>>>   * Master `/state` endpoint: ~130% throughput improvement through RapidJSON
>>>   * Allocator: Improved allocator cycle significantly
>>>   * Agent `/containers` endpoint: Fixed a performance issue
>>>   * Agent container launch / destroy throughput is significantly improved
>>> * Containerization:
>>>   * **Experimental** Supported docker image tarball fetching from HDFS
>>>   * Added new `cgroups/all` and `linux/devices` isolators
>>>   * Added metrics for `network/cni` isolator and docker pull latency
>>> * Windows:
>>>   * Added support to libprocess for the Windows Thread Pool API
>>> * Multi-Framework Workloads:
>>>   * **Experimental** Added per-framework metrics to the master
>>>   * A new weighted random sorter was added as an alternative to the DRF 
>>> sorter
>>> 
>>> The CHANGELOG for the release is available at:
>>> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.7.0-rc2
>>> 
>>> 
>>> The candidate for Mesos 1.7.0 release is available at:
>>> https://dist.apache.org/repos/dist/dev/mesos/1.7.0-rc2/mesos-1.7.0.tar.gz
>>> 
>>> The tag to be voted on is 1.7.0-rc2:
>>> https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.7.0-rc2
>>> 
>>> The SHA512 checksum of the tarball can be found at:
>>> https://dist.apache.org/repos/dist/dev/mesos/1.7.0-rc2/mesos-1.7.0.tar.gz.sha512
>>> 
>>> The signature of the tarball can be found at:
>>> https://dist.apache.org/repos/dist/dev/mesos/1.7.0-rc2/mesos-1.7.0.tar.gz.asc
>>> 
>>> The PGP key used to sign the release is here:
>>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>> 
>>> The JAR is in a staging repository here:
>>> https://repository.apache.org/content/repositories/orgapachemesos-1233
>>> 
>>> Please vote on releasing this package as Apache Mesos 1.7.0!
>>> 
>>> The vote is open until Mon Aug 27 16:37:35 PDT 2018 and passes if a 
>>> majority of at least 3 +1 PMC votes are cast.
>>> 
>>> [ ] +1 Release this package as Apache Mesos 1.7.0
>>> [ ] -1 Do not release this package because ...
>>> 
>>> Thanks,
>>> Chun-Hung & Gaston
>> 


Re: MesosCon 2018 Location Change

2018-08-26 Thread Vinod Kone
+1 for Bay area 

Thanks,
Vinod

> On Aug 26, 2018, at 12:02 AM, Vaibhav Khanduja  
> wrote:
> 
> +1 for bay area.
> 
> Thx
> 
>> On Sat, Aug 25, 2018, 3:20 PM Jörg Schad  wrote:
>> 
>> Just one more comment on the reasoning here:
>> We (i.e., the PC) want MesosCon to be a user-driven conference and hence
>> have the conference at a location where we can gather most users.
>> We understand it might be more difficult to travel to the Bay Area from
>> Europe, but are already considering EU timezone friendly working groups
>> meetings which could be joined remotely. Stay tuned here.
>> We understand this is a beyond last minute change, but we are considering
>> as a result of community (i.e., everyone here) feedback.
>> 
>> Please also consider this is the first time we are organizing MesosCon as
>> community ourselves (the previous years it was organized by Linux
>> Foundation) and so far I must say kudos to everyone involved. It is great
>> to see everyone working on making it a great Mesos (+Marathon, + Paasta, +
>> ...) community conference!
>> 
>> Also feel free to reach out personally if you have questions!
>> 
>> 
>> 
>>> On Fri, Aug 24, 2018 at 2:23 PM, Sunil Shah  wrote:
>>> 
>>> Hey everyone,
>>> 
>>> As we continue to organise this year's MesosCon, I wanted to ask for your
>>> preferences on location of the conference. Several community members have
>>> expressed a desire to have the conference in the Bay Area (as opposed to
>>> New York, as currently planned).
>>> 
>>> As a reminder, this year's MesosCon is a community run conference and is
>>> planned for November 5th to 7th.
>>> 
>>> Please let me know if you have any strong feelings one way or another and
>>> I'll take a summary back to the MesosCon Committee.
>>> 
>>> Cheers,
>>> 
>>> Sunil
>>> (P.S., If you haven't submitted a talk already, please do
>>> !)
>>> 
>>> 
>>> 
>> 


Re: [VOTE] Release Apache Mesos 1.4.2 (rc1)

2018-08-14 Thread Vinod Kone
I see some flaky tests in ASF CI, that I don't see already reported.

@Kapil Arya   Can you take a look at
https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/53 and see
if the flaky tests are due to bugs in test code and not source?

*Revision*: 612ec2c63a68b4d5b60d1d864e6703fde1c2a023

   - refs/tags/1.4.2-rc1

Configuration Matrix gcc clang
centos:7 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Failed]

cmake
[image: Failed]

[image: Success]

--verbose autotools
[image: Success]

[image: Failed]

cmake
[image: Success]

[image: Success]


On Mon, Aug 13, 2018 at 7:41 PM Benjamin Mahler  wrote:

> +1 (binding)
>
> make check passes on macOS 10.13.6 with Apple LLVM version 9.1.0
> (clang-902.0.39.2).
>
> Thanks Kapil!
>
> On Wed, Aug 8, 2018 at 3:06 PM, Kapil Arya  wrote:
>
> > Hi all,
> >
> > Please vote on releasing the following candidate as Apache Mesos 1.4.2.
> >
> > 1.4.2 is a bug fix release. The CHANGELOG for the release is available
> at:
> > https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_
> > plain;f=CHANGELOG;hb=1.4.2-rc1
> >
> > The candidate for Mesos 1.4.2 release is available at:
> >
> https://dist.apache.org/repos/dist/dev/mesos/1.4.2-rc1/mesos-1.4.2.tar.gz
> >
> > The tag to be voted on is 1.4.2-rc1:
> > https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.4.2-rc1
> >
> > The SHA512 checksum of the tarball can be found at:
> > https://dist.apache.org/repos/dist/dev/mesos/1.4.2-rc1/
> > mesos-1.4.2.tar.gz.sha512
> >
> > The signature of the tarball can be found at:
> > https://dist.apache.org/repos/dist/dev/mesos/1.4.2-rc1/
> > mesos-1.4.2.tar.gz.asc
> >

Re: Mesos task history after restart

2018-08-02 Thread Vinod Kone
Mesos doesn't store task information in ZooKeeper. It is stored in Mesos
master's memory and recovered from Mesos agents after a master restart /
leader election.

The task history can be controlled by `max_completed_frameworks`,
`max_completed_tasks_per_framework` master flags.

AFAICT, the completed frameworks/executors/tasks information is recovered
too from the agents after a master failover. Are you not seeing them after
a master failover? Which version are you running?

On Tue, Jul 31, 2018 at 1:02 PM samiksha baskar 
wrote:

> I am using mesos for container orchestration and get task history from
> mesos using */task* endpoint.
>
> Mesos is running in a 7 nodes cluster and zookeeper is running in a 3 node
> cluster. I hope, mesos uses Zookeeper to store the task History. We lost
> history sometimes when we restart mesos. Does it store in memory? I am
> trying to understand what is happening here.
>
> My questions are,
>
>1. Where does it store task histories?
>2. How can we configure the task history cleanup policy?
>3. Why do we loose complete task history on restarting mesos?
>
>


[RESULT] [VOTE] Move the project repos to gitbox

2018-07-20 Thread Vinod Kone
Hi,

This vote has passed with 7 +1s and no 0s or -1s!

+1 (binding)
-
Vinod Kone
James Peach
Zhitao Li
Andrew Schwartzmeyer
Jie Yu
Greg Mann
Gaston Kleiman

I'll file an INFRA ticket to get the process in motion.

Thanks,
Vinod


On Tue, Jul 17, 2018 at 8:27 PM Gastón Kleiman  wrote:

> On Tue, Jul 17, 2018 at 7:59 AM Vinod Kone  wrote:
>
>> Hi,
>>
>> As discussed in another thread and in the committers sync, there seem to
>> be
>> heavy interest in moving our project repos ("mesos", "mesos-site") from
>> the
>> "git-wip" git server to the new "gitbox" server to better avail GitHub
>> integrations.
>>
>> Please vote +1, 0, -1 regarding the move to gitbox. The vote will close in
>> 3 business days.
>>
>
> +1
>


[VOTE] Move the project repos to gitbox

2018-07-17 Thread Vinod Kone
Hi,

As discussed in another thread and in the committers sync, there seem to be
heavy interest in moving our project repos ("mesos", "mesos-site") from the
"git-wip" git server to the new "gitbox" server to better avail GitHub
integrations.

Please vote +1, 0, -1 regarding the move to gitbox. The vote will close in
3 business days.

Thanks,
Vinod


Re: Backport Policy

2018-07-16 Thread Vinod Kone
oreseen consequences, which I
> >>> believe is something to be actively avoided in already released
> versions.
> >>> The reason for backporting patches to fix regressions is the same as
> the
> >>> reason to avoid backporting as much as possible: keep behavior
> consistent
> >>> (and safe) within a release. With that as the goal of a branch in
> >>> maintenance mode, it makes sense to fix regressions, and make
> exceptions to
> >>> fix CVEs and other critical/blocking issues.
> >>>
> >>> As for who should decide what to backport, I lean toward Ben's view of
> >>> the burden being on the committer. I don't think we should add more
> work
> >>> for release managers, and I think the committer/shepherd obviously has
> the
> >>> most understanding of the context around changes proposed for backport.
> >>>
> >>> Here's an example of a recent bugfix which I backported:
> >>> https://reviews.apache.org/r/67587/ (for MESOS-3790)
> >>>
> >>> While normally I believe this change falls under "avoid due to
> >>> unforeseen consequences," I made an exception as the bug was old, circa
> >>> 2015, (indicating it had been an issue for others), and was causing
> >>> recurring failures in testing. The fix itself was very small, meaning
> it
> >>> was easier to evaluate for possible side effects, so I felt a little
> safer
> >>> in that regard. The effect of not having the fix was a fatal and
> undesired
> >>> crash, which furthermore left troublesome side effects on the system
> (you
> >>> couldn't bring the agent back up). And lastly, a dependent project
> (DC/OS)
> >>> wanted it in their next bump, which necessitated backporting to the
> release
> >>> they were pulling in.
> >>>
> >>> I think in general we should backport only as necessary, and leave it
> on
> >>> the committers to decide if backporting a particular change is
> necessary.
> >>>
> >>>
> >>> On 07/13/2018 12:54 am, Alex Rukletsov wrote:
> >>>
> >>>> This is exactly where our views differ, Ben : )
> >>>>
> >>>> Ideally, I would like a release manager to have more ownership and
> less
> >>>> manual work. In my imagination, a release manager has more power and
> >>>> control about dates, features, backports and everything that is
> related
> >>>> to
> >>>> "their" branch. I would also like us to back port as little as
> >>>> possible, to
> >>>> simplify testing and releasing patch versions.
> >>>>
> >>>> On Fri, Jul 13, 2018 at 1:17 AM, Benjamin Mahler 
> >>>> wrote:
> >>>>
> >>>> +user, I probably it would be good to hear from users as well.
> >>>>>
> >>>>> Please see the original proposal as well as Alex's proposal and let
> us
> >>>>> know
> >>>>> your thoughts.
> >>>>>
> >>>>> To continue the discussion from where Alex left off:
> >>>>>
> >>>>> > Other bugs and significant improvements, e.g., performance, may be
> >>>>> back
> >>>>> ported,
> >>>>> the release manager should ideally be the one who decides on this.
> >>>>>
> >>>>> I'm a little puzzled by this, why is the release manager involved? As
> >>>>> we
> >>>>> already document, backports occur when the bug is fixed, so this
> >>>>> happens in
> >>>>> the steady state of development, not at release time. The release
> >>>>> manager
> >>>>> only comes in at the time of the release itself, at which point all
> >>>>> backports have already happened and the release manager handles the
> >>>>> release
> >>>>> process. Only blocker level issues can stop the release and while the
> >>>>> release manager has a strong say, we should generally agree on what
> >>>>> consists of a release blocking issue.
> >>>>>
> >>>>> Just to clarify my workflow, I generally backport every bug fix I
> >>>>> commit
> >>>>> that applies cleanly, right after I commit it to master (with the
> >>>>> exceptions I listed below).
> >>>>>
> >&

Re: [VOTE] Release Apache Mesos 1.6.1 (rc2)

2018-07-13 Thread Vinod Kone
+1 (binding)

Ran through ASF CI. Red builds were known health check / check flaky tests.

*Revision*: ae82dd5cc6f415916702897acfd3085b6387b118

   - refs/tags/1.6.1-rc2

Configuration Matrix gcc clang
centos:7 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Failed]

cmake
[image: Success]

[image: Success]

--verbose autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]



On Fri, Jul 13, 2018 at 12:48 PM Chun-Hung Hsiao 
wrote:

> +1 (binding)
>
> Tested on our internal CI. All green.
> Tested on my Mac with both autotools and CMake, with gRPC enabled.
> Failed tests:
>
> HealthCheckTest.ROOT_INTERNET_CURL_HealthyTaskViaHTTPWithContainerImage
> HealthCheckTest.ROOT_INTERNET_CURL_HealthyTaskViaHTTPSWithContainerImage
> HealthCheckTest.ROOT_INTERNET_CURL_HealthyTaskViaTCPWithContainerImage
> FetcherCacheTest.LocalUncachedExtract
> FetcherCacheHttpTest.HttpMixed
>
> MesosContainerizer/DefaultExecutorTest.ROOT_INTERNET_CURL_DockerTaskWithFileURI
> MesosContainerizer/DefaultExecutorTest.ROOT_LaunchGroupFailure
>
> LauncherAndIsolationParam/PersistentVolumeDefaultExecutor.ROOT_PersistentResources
>
> LauncherAndIsolationParam/PersistentVolumeDefaultExecutor.ROOT_TaskSandboxPersistentVolume
>
> LauncherAndIsolationParam/PersistentVolumeDefaultExecutor.ROOT_TasksSharingViaSandboxVolumes
>
> LauncherAndIsolationParam/PersistentVolumeDefaultExecutor.ROOT_TaskGroupsSharingViaSandboxVolumes
>
> LauncherAndIsolationParam/PersistentVolumeDefaultExecutor.ROOT_HealthCheckUsingPersistentVolume
>
> All of the above tests require the `filesystem/linux` isolator so are
> supposed to fail 

Re: [VOTE] Release Apache Mesos 1.6.1 (rc1)

2018-06-27 Thread Vinod Kone
Hmm. Lot of tests failed when I ran this through ASF CI. Not sure if all of
these are known flaky tests?

https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/50/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/console

https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/50/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/console

On Wed, Jun 27, 2018 at 11:59 AM Jie Yu  wrote:

> +1
>
> Passed on our internal CI that has the following matrix. I looked into the
> only failed test, looks to be a flaky test due to a race in the test.
>
>
>
> On Tue, Jun 26, 2018 at 7:02 PM, Greg Mann  wrote:
>
>> Hi all,
>>
>> Please vote on releasing the following candidate as Apache Mesos 1.6.1.
>>
>>
>> 1.6.1 includes the following:
>>
>> 
>> *Announce major features here*
>> *Announce major bug fixes here*
>>
>> The CHANGELOG for the release is available at:
>>
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.6.1-rc1
>>
>> 
>>
>> The candidate for Mesos 1.6.1 release is available at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.6.1-rc1/mesos-1.6.1.tar.gz
>>
>> The tag to be voted on is 1.6.1-rc1:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.6.1-rc1
>>
>> The SHA512 checksum of the tarball can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/mesos/1.6.1-rc1/mesos-1.6.1.tar.gz.sha512
>>
>> The signature of the tarball can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/mesos/1.6.1-rc1/mesos-1.6.1.tar.gz.asc
>>
>> The PGP key used to sign the release is here:
>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>
>> The JAR is in a staging repository here:
>> https://repository.apache.org/content/repositories/orgapachemesos-1229
>>
>> Please vote on releasing this package as Apache Mesos 1.6.1!
>>
>> The vote is open until Fri Jun 29 18:46:28 PDT 2018 and passes if a
>> majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Mesos 1.6.1
>> [ ] -1 Do not release this package because ...
>>
>> Thanks,
>> Greg
>>
>
>


Re: New Mesos User Groups in the Netherlands and Belgium

2018-06-21 Thread Vinod Kone
Thanks for taking the lead on this Mats and Tomek!

I have pushed the commit. Please keep us updated on how the meetups go!


-- Vinod

On Thu, Jun 21, 2018 at 8:10 AM, Tomek Janiszewski 
wrote:

> PR: https://github.com/apache/mesos/pull/298/files
>
>
> czw., 21 cze 2018, 14:29 użytkownik Mats Uddenfeldt <
> mats.uddenfe...@gmail.com> napisał:
>
>> Hi there,
>>
>> We've recently launched Mesos User Groups in the Netherlands (
>> https://www.meetup.com/Dutch-Mesos-User-Group/) and Belgium (
>> https://www.meetup.com/Belgian-Mesos-User-Group/) and also noticed that
>> the old Amsterdam Mesos User Group has changed its topic away from Mesos.
>>
>> I tried to reach out to the creator of the old group, but he has no
>> interest in changing the topic back.
>>
>> Could you please update the list on the web site by removing the old
>> Amsterdam group and adding the new ones? If it would be possible to send
>> out a tweet for each group from the official Apache Mesos account it would
>> be great to boost awareness of the groups! Thanks!
>>
>> Cheers,
>>
>> Mats.
>>
>


Re: [VOTE] Release Apache Mesos 1.3.3 (rc1)

2018-05-31 Thread Vinod Kone
=
I0529 21:04:38.781270 28418 openssl.cpp:429] Will not verify peer certificate!
NOTE: Set LIBPROCESS_SSL_VERIFY_CERT=1 to enable peer certificate verification
I0529 21:04:38.781277 28418 openssl.cpp:435] Will only verify peer
certificate if presented!
NOTE: Set LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer certificate verification
E0529 21:04:38.781814 28435 process.cpp:956] Failed to accept socket:
future discarded
*** Aborted at 1527627878 (unix time) try "date -d @1527627878" if you
are using GNU date ***
PC: @ 0x7fcbd5615dd6 __memcpy_ssse3_back
*** SIGSEGV (@0x5cabd78) received by PID 28418 (TID 0x7fcbcc6dd700)
from PID 97172856; stack trace: ***
I0529 21:04:38.797348 28418 process.cpp:1272] libprocess is
initialized on 172.17.0.3:47350 with 16 worker threads
@ 0x7fcbd66dd6d0 (unknown)
@ 0x7fcbd5615dd6 __memcpy_ssse3_back
@ 0x7fcbd5e636f0 (unknown)
@ 0x7fcbd5e63d9c (unknown)
@   0x42af09 process::UPID::UPID()
I0529 21:04:38.803799 29172 process.cpp:3741] Handling HTTP event for
process '(77)' with path: '/(77)/body'
@   0x8edfaa process::DispatchEvent::DispatchEvent()
@   0x8e6560 process::internal::dispatch()
I0529 21:04:38.809983 29176 process.cpp:3741] Handling HTTP event for
process '(77)' with path: '/(77)/pipe'
@   0x900ad8 process::dispatch<>()
@   0x8e548c process::ProcessBase::route()
I0529 21:04:38.821267 29181 process.cpp:3741] Handling HTTP event for
process '(77)' with path: '/(77)/body'
I0529 21:04:38.821970 29182 process.cpp:3798] Failed to process
request for '/(77)/body': failure
I0529 21:04:38.821995 29172 process.cpp:1482] Returning '500 Internal
Server Error' for '/(77)/body' (failure)
[   OK ] Scheme/HTTPTest.Endpoints/0 (227 ms)
[ RUN  ] Scheme/HTTPTest.Endpoints/1
@   0x9d823d process::ProcessBase::route<>()
@   0x9d4480 process::Help::initialize()
@   0x8de9e8 process::ProcessManager::resume()
@   0x8db3be _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv
@   0x8ed63e
_ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
@   0x8ed582
_ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv
@   0x8ed50c
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
@ 0x7fcbd5e5a070 (unknown)
@ 0x7fcbd66d5e25 start_thread
@ 0x7fcbd55bebad __clone
make[7]: *** [check-local] Segmentation fault





On Tue, May 29, 2018 at 12:28 PM, Benjamin Mahler 
wrote:

> +1 (binding)
>
> Make check passes on macOS 10.13.4 with Apple LLVM version 9.1.0
> (clang-902.0.39.1).
>
> On Wed, May 23, 2018 at 10:00 PM, Michael Park  wrote:
>
> > The tarball has been fixed, please vote now!
> >
> > 'twas BSD `tar` issues... :(
> >
> > Thanks,
> >
> > MPark
> >
> > On Wed, May 23, 2018 at 11:39 AM, Michael Park  wrote:
> >
> >> Huh... 樂 Super weird. I'll look into it.
> >>
> >> Thanks for checking!
> >>
> >> MPark
> >>
> >> On Wed, May 23, 2018 at 11:34 AM Vinod Kone 
> wrote:
> >>
> >>> It's empty for me too!
> >>>
> >>> On Wed, May 23, 2018 at 11:32 AM, Benjamin Mahler 
> >>> wrote:
> >>>
> >>>> Thanks Michael!
> >>>>
> >>>> Looks like the tar.gz is empty, is it just me?
> >>>>
> >>>> On Tue, May 22, 2018 at 10:09 PM, Michael Park 
> >>>> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> Please vote on releasing the following candidate as Apache Mesos
> 1.3.3.
> >>>>>
> >>>>> The CHANGELOG for the release is available at:
> >>>>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_p
> >>>>> lain;f=CHANGELOG;hb=1.3.3-rc1
> >>>>> 
> >>>>> 
> >>>>>
> >>>>> The candidate for Mesos 1.3.3 release is available at:
> >>>>> https://dist.apache.org/repos/dist/dev/mesos/1.3.3-rc1/mesos
> >>>>> -1.3.3.tar.gz
> >>>>>
> >>>>> The tag to be voted on is 1.3.3-rc1:
> >>>>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit
> >>>>> ;h=1.3.3-rc1
> >>>>>
> >>>>> The SHA512 checksum of the tarball can be found at:
> >>>>> https://dist.apache.org/repos/dist/dev/mesos/1.3.3-rc1/mesos
> >>>>> -1.3.3.tar.gz.sha512
> >>>>>
> >>>>> The signature of the tarball can be found at:
> >>>>> https://dist.apache.org/repos/dist/dev/mesos/1.3.3-rc1/mesos
> >>>>> -1.3.3.tar.gz.asc
> >>>>>
> >>>>> The PGP key used to sign the release is here:
> >>>>> https://dist.apache.org/repos/dist/release/mesos/KEYS
> >>>>>
> >>>>> The JAR is up in Maven in a staging repository here:
> >>>>> https://repository.apache.org/content/repositories/
> orgapachemesos-1226
> >>>>>
> >>>>> Please vote on releasing this package as Apache Mesos 1.3.3!
> >>>>>
> >>>>> The vote is open until Fri May 25 22:07:39 PDT 2018 and passes if a
> >>>>> majority of at least 3 +1 PMC votes are cast.
> >>>>>
> >>>>> [ ] +1 Release this package as Apache Mesos 1.3.3
> >>>>> [ ] -1 Do not release this package because ...
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> MPark
> >>>>>
> >>>>
> >>>>
> >>>
> >
>


Re: [VOTE] Release Apache Mesos 1.3.3 (rc1)

2018-05-23 Thread Vinod Kone
It's empty for me too!

On Wed, May 23, 2018 at 11:32 AM, Benjamin Mahler 
wrote:

> Thanks Michael!
>
> Looks like the tar.gz is empty, is it just me?
>
> On Tue, May 22, 2018 at 10:09 PM, Michael Park  wrote:
>
>> Hi all,
>>
>> Please vote on releasing the following candidate as Apache Mesos 1.3.3.
>>
>> The CHANGELOG for the release is available at:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_p
>> lain;f=CHANGELOG;hb=1.3.3-rc1
>> 
>> 
>>
>> The candidate for Mesos 1.3.3 release is available at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.3.3-rc1/mesos-1.3.3.tar.gz
>>
>> The tag to be voted on is 1.3.3-rc1:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.3.3-rc1
>>
>> The SHA512 checksum of the tarball can be found at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.3.3-rc1/mesos
>> -1.3.3.tar.gz.sha512
>>
>> The signature of the tarball can be found at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.3.3-rc1/mesos
>> -1.3.3.tar.gz.asc
>>
>> The PGP key used to sign the release is here:
>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>
>> The JAR is up in Maven in a staging repository here:
>> https://repository.apache.org/content/repositories/orgapachemesos-1226
>>
>> Please vote on releasing this package as Apache Mesos 1.3.3!
>>
>> The vote is open until Fri May 25 22:07:39 PDT 2018 and passes if a
>> majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Mesos 1.3.3
>> [ ] -1 Do not release this package because ...
>>
>> Thanks,
>>
>> MPark
>>
>
>


Re: Deprecating the Python bindings

2018-05-09 Thread Vinod Kone
One of the production users that I know who used to depend on python
bindings were https://github.com/douban.

Also, apache aurora used to have an executor that depended on python
bindings.

I don't know what their dependencies are these days w.r.t python bindings.

On Wed, May 9, 2018 at 11:51 AM, Andrew Schwartzmeyer <
and...@schwartzmeyer.com> wrote:

> Hi all,
>
> There are two parallel efforts underway that would both benefit from
> officially deprecating (and then removing) the Python bindings. The first
> effort is the move to the CMake system: adding support to generate the
> Python bindings was investigated but paused (see MESOS-8118), and the
> second effort is the move to Python 3: producing Python 3 compatible
> bindings is under investigation but not in progress (see MESOS-7163).
>
> Benjamin Bannier, Joseph Wu, and I have all at some point just wondered
> how the community would fare if the Python bindings were officially
> deprecated and removed. So please, if this would negatively impact you or
> your project, let me know in this thread.
>
> Thanks,
>
> Andrew Schwartzmeyer
>


Re: Questions about secret handling in Mesos

2018-04-26 Thread Vinod Kone
We do direct protobuf to JSON conversion for our API endpoints and I don't
think we do any special case logic for `Secret` type in that conversion. So
`value` based secrets will have their value show up in v1 (and likely v0)
API endpoints.

On Mon, Apr 23, 2018 at 9:25 AM, Zhitao Li  wrote:

> Hi Alexander,
>
> We discovered that in our own testing thus do not plan to use the
> environment variable. For the `volume/secret` case, I believe it's possible
> to be careful enough so we do not log that, so it's more about whether we
> want to promise that.
>
> What do you think?
>
> On Mon, Apr 23, 2018 at 5:13 AM, Alexander Rojas 
> wrote:
>
>>
>> Hey Zhitao,
>>
>> I sadly have to tell you that the first assumption is not correct. If you
>> use environment based secrets, docker and verbose mode, they will get
>> printed (see this patch https://reviews.apache.org/r/57846/). The reason
>> is that the docker command will get logged and it might contain your
>> secrets. You may end up with some logging line like:
>>
>> ```
>> I0129 14:09:22.444318 docker.cpp:1139] Running docker -H
>> unix:///var/run/docker.suck run --cpu-shares 25 --memory 278435456 -e
>> ADMIN_PASSWORD=test_password …
>> ```
>>
>>
>> On 19. Apr 2018, at 19:57, Zhitao Li  wrote:
>>
>> Hello,
>>
>> We at Uber plan to use volume/secret isolator to send secrets from Uber
>> framework to Mesos agent.
>>
>> For this purpose, we are referring to these documents:
>>
>>- File based secrets design doc
>>
>> 
>>and slides
>>
>> 
>>.
>>- Apache Mesos secrets documentation
>>
>>
>> Could you please confirm that the following assumptions are correct?
>>
>>- Mesos agent and master will never log the secret data at any
>>logging level;
>>- Mesos agent and master will never expose the secret data as part of
>>any API response;
>>- Mesos agent and master will never store the secret in any
>>persistent storage, but only on tmpfs or ramfs;
>>- When the secret is first downloaded on the mesos agent, it will be
>>stored as "root" on the tmpfs/ramfs before being mounted in the container
>>ramfs.
>>
>> If above assumptions are true, then I would like to see them documented
>> in this as part of the Apache Mesos secrets documentation
>> . Otherwise, we'd
>> like to have a design discussion with maintainer of the isolator.
>>
>> We appreciate your help regarding this. Thanks!
>>
>> Regards,
>> Aditya And Zhitao
>>
>>
>>
>
>
> --
> Cheers,
>
> Zhitao Li
>


Re: This Month in Mesos - March 2018

2018-03-30 Thread Vinod Kone
Thanks for the update Greg!

Sent from my phone

> On Mar 30, 2018, at 3:08 PM, Greg Mann  wrote:
> 
> Oh hai there Apache Mesos Community!
> 
> Back again with your monthly update on current events in the Mesosverse:
> 
> 
> *Working Groups*
> 
> Below you'll find a brief summary of the group meetings from this past
> month, as well as some info about related work that's been happening in the
> project. Working group meetings can be found on the Mesos community calendar
> , and you should feel
> free to add agenda items beforehand!
> 
> 
> *API Working Group*
> 
> [Agenda Doc
> 
> ]
> 
> Next Meeting: April 3 @ 11am PST
> 
> In March we held the first two meetings of the new API working group! This
> has brought about a revival of our perennial discussion on the preferred
> Mesos release cadence; you can expect an updated release policy in our
> documentation shortly. It's looking like the new policy will be in line
> with what we have been doing in practice for the last few releases, so no
> big changes there.
> 
> 
> Zhitao also presented his ongoing work on new operations which will allow
> the growing/shrinking of persistent volumes. You can find his design doc
> here
> 
> .
> 
> 
> *Containerization Working Group*
> 
> [Agenda Doc
> 
> ]
> 
> Next meeting: April 5 @ 9am PST
> 
> Two big items in the containerization space this month:
> 
> 
>   - Improvements to the Docker containerizer/executor to more gracefully
>   handle bugs in the Docker daemon: MESOS-8572
>   
>   - Configurable network namespaces for nested containers: MESOS-8534
>   
> 
> *Community Working Group*
> 
> [Agenda Doc
> 
> ]
> 
> Next Meeting: April 9 @ 10:30am PST
> 
> Community working group had a preliminary discussion about the next
> quarterly doc-a-thon, and discussed the possibility of spinning up a new
> Releases Working Group. We also discussed plans for the next MesosCon, and
> how we may want to evolve that event going forward.
> 
> 
> *Performance Working Group*
> 
> [Agenda Doc
> 
> ]
> 
> Next meeting: April 18 @ 10am PST
> 
> We now have a performance dashboard
> 
> which lets you view tickets in ASF JIRA which have been marked as
> performance-related - take a look!
> 
> 
> Some additional copy elimination
>  patches have been
> merged, with more yet to come. The group also discussed the near-term
> performance roadmap, which includes optimization of
> authentication/authorization, master state computation, and the libprocess
> HTTP code; see the agenda document for more details.
> 
> 
> 
> Until next time,
> -Greg


Re: Release policy and 1.6 release schedule

2018-03-23 Thread Vinod Kone
I’m +1 for quarterly. 

Most importantly I want us to adhere to a predictable cadence. 

Sent from my phone

> On Mar 23, 2018, at 9:21 PM, Jie Yu  wrote:
> 
> It's a burden for supporting multiple releases.
> 
> 1.2 was released March, 2017 (1 year ago), and I know that some users are 
> still on that version
> 1.3 was released June, 2017 (9 months ago), and we're still maintaining it 
> (still backport patches several days ago, which some users asked)
> 1.4 was released Sept, 2017 (6 months ago).
> 1.5 was released Feb, 2018 (1 month ago).
> 
> As you can see, users expect a release to be supported 6-9 months (e.g., 
> backports are still needed for 1.3 release, which is 9 months old). If we 
> were to do monthly minor release, we'll probably need to maintain 6-9 release 
> branches? That's too much of an ask for committers and maintainers.
> 
> I also agree with folks that there're benefits doing releases more 
> frequently. Given the historical data, I'd suggest we do quarterly releases, 
> and maintain three release branches.
> 
> - Jie
> 
>> On Fri, Mar 23, 2018 at 10:03 AM, Greg Mann  wrote:
>> The best motivation I can think of for a shorter release cycle is this: if
>> the release cadence is fast enough, then developers will be less likely to
>> rush a feature into a release. I think this would be a real benefit, since
>> rushing features in hurts stability. *However*, I'm not sure if every two
>> months is fast enough to bring this benefit. I would imagine that a
>> two-month wait is still long enough that people wouldn't want to wait an
>> entire release cycle to land their feature. Just off the top of my head, I
>> might guess that a release cadence of 1 month or shorter would be often
>> enough that it would always seem reasonable for a developer to wait until
>> the next release to land a feature. What do y'all think?
>> 
>> Other motivating factors that have been raised are:
>> 1) Many users upgrade on a longer timescale than every ~2 months. I think
>> that this doesn't need to affect our decision regarding release timing -
>> since we guarantee compatibility of all releases with the same major
>> version number, there is no reason that a user needs to upgrade minor
>> releases one at a time. It's fine to go from 1.N to 1.(N+3), for example.
>> 2) Backporting will be a burden if releases are too short. I think that in
>> practice, backporting will not take too much longer. If there was a
>> conflict back in the tree somewhere, then it's likely that after resolving
>> that conflict once, the same diff can be used to backport the change to
>> previous releases as well.
>> 3) Adhering strictly to a time-based release schedule will help users plan
>> their deployments, since they'll be able to rely on features being released
>> on-schedule. However, if we do strict time-based releases, then it will be
>> less certain that a particular feature will land in a particular release,
>> and users may have to wait a release cycle to get the feature.
>> 
>> Personally, I find the idea of preventing features from being rushed into a
>> release very compelling. From that perspective, I would love to see
>> releases every month. However, if we're not going to release that often,
>> then I think it does make sense to adjust our release schedule to
>> accommodate the features that community members want to land in a
>> particular release.
>> 
>> 
>> Jie, I'm curious why you suggest a *minimal* interval between releases.
>> Could you elaborate a bit on your motivations there?
>> 
>> Cheers,
>> Greg
>> 
>> 
>> On Fri, Mar 16, 2018 at 2:01 PM, Jie Yu  wrote:
>> 
>> > Thanks Greg for starting this thread!
>> >
>> >
>> >> My primary motivation here is to bring our documented policy in line
>> >> with our practice, whatever that may be
>> >
>> >
>> > +100
>> >
>> > Do people think that we should attempt to bring our release cadence more
>> >> in line with our current stated policy, or should the policy be changed
>> >> to reflect our current practice?
>> >
>> >
>> > I think a minor release every 2 months is probably too aggressive. I don't
>> > have concrete data, but my feeling is that the frequency that folks upgrade
>> > Mesos is low. I know that many users are still on 1.2.x.
>> >
>> > I'd actually suggest that we have a *minimal* interval between two
>> > releases (e.g., 3 months), and provide some buffer for the release process.
>> > (so we're expecting about 3 releases per year, this matches what we did
>> > last year).
>> >
>> > And we use our dev sync to coordinate on a release after the minimal
>> > release interval has elapsed (and elect a release manager).
>> >
>> > - Jie
>> >
>> > On Wed, Mar 14, 2018 at 9:51 AM, Zhitao Li  wrote:
>> >
>> >> An additional data point is how long it takes from first RC being cut to
>> >> the final release tag vote passes. That probably indicates smoothness of
>> >> the release process 

Re: [VOTE] Release Apache Mesos 1.5.0 (rc2)

2018-02-05 Thread Vinod Kone
+1 (binding)

Tested on ASF CI. The red builds were known flaky tests regarding
checks/health checks.

*Revision*: f7e3872b0359c6095f8eeaefe408cb7dcef5bb83

   - refs/tags/1.5.0-rc2

Configuration Matrix gcc clang
centos:7 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]


On Sat, Feb 3, 2018 at 11:11 AM, Zhitao Li  wrote:

> +1 (non-binding)
>
> Tested with running all tests on Debian/jessie server on AWS.
>
> On Fri, Feb 2, 2018 at 3:25 PM, Jie Yu  wrote:
>
>> +1
>>
>> Verified in our internal CI that `sudo make check` passed in CentOS 6,
>> CentOS7, Debian 8, Ubuntu 14.04, Ubuntu 16.04 (both w/ or w/o SSL
>> enabled).
>>
>>
>> On Thu, Feb 1, 2018 at 5:36 PM, Gilbert Song  wrote:
>>
>> > Hi all,
>> >
>> > Please vote on releasing the following candidate as Apache Mesos 1.5.0.
>> >
>> > 1.5.0 includes the following:
>> > 
>> > 
>> >   * Support Container Storage Interface (CSI).
>> >   * Agent reconfiguration policy.
>> >   * Auto GC docker images in Mesos Containerizer.
>> >   * Standalone containers.
>> >   * Support gRPC client.
>> >   * Non-leading VOTING replica catch-up.
>> >
>> >
>> > The CHANGELOG for the release is available at:
>> > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_p
>> > lain;f=CHANGELOG;hb=1.5.0-rc2
>> > 
>> > 

Re: This Month in Mesos - January 2018

2018-01-30 Thread Vinod Kone
Thanks Greg for the update!


*Medium Posts*
> Going forward, we'll be cross-posting Apache Mesos blog posts on Medium to
> increase exposure, get feedback on engagement, and make it easier to share.
> Check out the new Apache Mesos publication  mesos> on
> Medium for the latest content!
>


I'm really looking forward to this. As a reminder, if any one in the
community wants to blog (cross post) about their use of Apache Mesos please
reach out to us. We are always looking for great content!


Re: Questions about Pods and the Mesos Containerizer

2018-01-29 Thread Vinod Kone
Hi David,

It's probably worth having a synchronous discussion around your proposed
approach in our slack. I would like to understand if TASK_GROUP is the
right primitive for your use case.

On Mon, Jan 29, 2018 at 1:32 PM, David Morrison  wrote:

>
>
> On Thu, Jan 25, 2018 at 5:49 PM, Gilbert Song 
> wrote:
>
>>
>>>-
>>>
>>>Is it possible to allocate a separate IP address per container in a
>>>pod?
>>>
>>> Right now nested containers share the network from their parent
>> container (pod). Do we have a specific use case that we need containers
>> inside of a taskgroup have different IP addresses?
>>
>
> For our use case, we need to be able to launch a relatively large number
> of containers inside a taskgroup that all listen on the same port (and the
> port is not easily-changeable).  So we need to be able to assign different
> IPs to the containers so they don't conflict.
>
> Cheers,
> David
>


Re: Reservation status monitoring

2018-01-18 Thread Vinod Kone
The agents tab in Mesos WebUI should have a table for per role
reservations. This is a new feature, so you might need to upgrade to the
latest version to get it.

On Tue, Jan 16, 2018 at 9:23 PM, 박도형  wrote:

> Hi Folks,
>
>
>
> Is there a easy way to see the current overall status of reservation of
> resources in Mesos Master?
>
> I want to monitor status such as reserved resources per agent or per role
> with the Mesos Web UI.
>
>
>
> DH Park.
>
>
>
> *Dohyeong Park, Engineer*
>
> Cloud Platform Group, Mobile R Office
>
> Mobile Communication Business
>
> *SAMSUNG ELECTRONICS CO,.LTD.*
>
> *E-mail.* doit.p...@samsung.com
>
>
>
>
>


Re: java driver/shutdown call

2018-01-17 Thread Vinod Kone
Mohit, you can reach out to us in #http-api channel in slack
<http://mesos.apache.org/community/> if you want to have a quick discussion.

On Wed, Jan 17, 2018 at 2:06 PM, Benjamin Mahler <bmah...@apache.org> wrote:

> Can you tell us more about what the use case is? Why do you think it's
> more robust?
>
> On Tue, Jan 16, 2018 at 8:41 PM Mohit Jaggi <mohit.ja...@uber.com> wrote:
>
>> I am trying to change Apache Aurora's code to call SHUTDOWN instead of
>> KILL. SHUTDOWN seems to offer more robust termination than KILL.
>>
>> On Tue, Jan 16, 2018 at 6:40 PM, Benjamin Mahler <bmah...@apache.org>
>> wrote:
>>
>>> Mohit, what are you trying to accomplish by going from KILL to SHUTDOWN?
>>>
>>> On Tue, Jan 16, 2018 at 5:15 PM, Joseph Wu <jos...@mesosphere.io> wrote:
>>>
>>>> If a framework launches tasks, then it will use an executor.  Mesos
>>>> provides a "default" executor if the framework doesn't explicitly specify
>>>> an executor.  (And the Shutdown call will work with that default executor.)
>>>>
>>>> On Tue, Jan 16, 2018 at 4:49 PM, Mohit Jaggi <mohit.ja...@uber.com>
>>>> wrote:
>>>>
>>>>> Gotcha. Another question: if a framework doesn't use executors, can it
>>>>> still use the SHUTDOWN call?
>>>>>
>>>>> On Fri, Jan 12, 2018 at 2:37 PM, Anand Mazumdar <
>>>>> mazumdar.an...@gmail.com> wrote:
>>>>>
>>>>>> Yes; It's a newer interface that still allows you to switch between
>>>>>> the v1 (new) and the old API.
>>>>>>
>>>>>> -anand
>>>>>>
>>>>>> On Fri, Jan 12, 2018 at 3:28 PM, Mohit Jaggi <mohit.ja...@uber.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Are you suggesting
>>>>>>>
>>>>>>> *send(new Call(METHOD, Param1, ...)) *
>>>>>>>
>>>>>>> instead of
>>>>>>>
>>>>>>> *driver.method(Param1, )*
>>>>>>>
>>>>>>> *?*
>>>>>>>
>>>>>>> On Fri, Jan 12, 2018 at 10:59 AM, Anand Mazumdar <
>>>>>>> mazumdar.an...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Mohit,
>>>>>>>>
>>>>>>>> You can use the V1Mesos class that uses the v1 API internally
>>>>>>>> allowing you to send the 'SHUTDOWN' call. We also have a V0Mesos class 
>>>>>>>> that
>>>>>>>> uses the old scheduler driver internally.
>>>>>>>>
>>>>>>>> -anand
>>>>>>>>
>>>>>>>> On Wed, Jan 10, 2018 at 2:53 PM, Mohit Jaggi <mohit.ja...@uber.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Vinod. Is there a V1SchedulerDriver.java file? I see
>>>>>>>>> https://github.com/apache/mesos/tree/
>>>>>>>>> 72752fc6deb8ebcbfbd5448dc599ef3774339d31/src/java/src/org/
>>>>>>>>> apache/mesos/v1/scheduler but it does not have a V1 driver.
>>>>>>>>>
>>>>>>>>> On Fri, Jan 5, 2018 at 3:59 PM, Vinod Kone <vinodk...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> That's right. It is only available for v1 schedulers.
>>>>>>>>>>
>>>>>>>>>> On Fri, Jan 5, 2018 at 3:38 PM, Mohit Jaggi <mohit.ja...@uber.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Folks,
>>>>>>>>>>> I am trying to change Apache Aurora's code to call SHUTDOWN
>>>>>>>>>>> instead of KILL. However, it seems that the SchedulerDriver class 
>>>>>>>>>>> in Mesos
>>>>>>>>>>> does not have a shutdownExecutor() call.
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/apache/mesos/blob/
>>>>>>>>>>> 72752fc6deb8ebcbfbd5448dc599ef3774339d31/src/java/src/org/
>>>>>>>>>>> apache/mesos/SchedulerDriver.java
>>>>>>>>>>>
>>>>>>>>>>> Mohit.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Anand Mazumdar
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Anand Mazumdar
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>


Re: Mesos rare TASK_LOST scenario v 0.21.0

2018-01-10 Thread Vinod Kone
The command executor was probably fixed somewhere between 0.21 and 1.3. The
only reason I mentioned 1.3+ is because any releases before that are out of
support period. If you can repro the issue with 1.3+ and paste the logs
here or in a JIRA, we can help debug it for you.

On Wed, Jan 10, 2018 at 9:47 AM, Ajay V <ajayv...@gmail.com> wrote:

> Thanks for getting back Vinod. So, does that mean that even for v1.2,
> these race conditions (where the command executor doesn't stay long enough
> ) existed and that 1.3 versions fixes them ?. Reason for asking is because
> I did try an upgrade to v1.2 and still found very similar issues.
>
> Regards,
> Ajay
>
> On Tue, Jan 9, 2018 at 6:48 PM, Vinod Kone <vinodk...@apache.org> wrote:
>
>> 0.21 is really old and not supported. I highly recommend you upgrade to
>> 1.3+.
>>
>> Regarding what you are seeing, we definitely had issues in the past where
>> the command executor didn't stay up long enough to guarantee that
>> TASK_FINISHED was delivered to the agent; so races like above were possible.
>>
>> On Tue, Jan 9, 2018 at 5:33 PM, Ajay V <ajayv...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I'm trying to debug a TASK_LOST thats generated on the agent that I see
>>> on rare occasions.
>>>
>>> Following is a log that I'm trying to understand. This is happening
>>> after the driver.sendStatusUpdate() has been called with a task state of
>>> TASK_FINISHED from a java executor. It looks to me like the container is
>>> already exited before the TASK_FINISHED  is processed. Is there a timing
>>> issue here in this version of mesos that is causing this? The effect of
>>> this problem is that, even though the work of the executor is complete and
>>> the executor calls the sendStatusUpdate with a TASK_FINISHED, the task is
>>> marked as LOST and the actual update of TASK_FINISHED is ignored.
>>>
>>> I0108 10:16:51.388300 37272 containerizer.cpp:1117] Executor for
>>> container 'bb0e5f2d-4bdb-479c-b829-4741993c4109' has exited
>>>
>>> I0108 10:16:51.388741 37272 containerizer.cpp:946] Destroying container
>>> 'bb0e5f2d-4bdb-479c-b829-4741993c4109'
>>>
>>> W0108 10:16:52.159241 37260 posix.hpp:192] No resource usage for unknown
>>> container 'bb0e5f2d-4bdb-479c-b829-4741993c4109'
>>>
>>> W0108 10:16:52.803463 37255 containerizer.cpp:888] Skipping resource
>>> statistic for container bb0e5f2d-4bdb-479c-b829-4741993c4109 because:
>>> Failed to get usage: No process found at 28952
>>>
>>> I0108 10:16:52.899657 37278 slave.cpp:2898] Executor
>>> 'ff631ad1-cfab-493e-be18-961581abcf3d' of framework
>>> 20171208-050805-140555025-5050-3470- exited with status 0
>>>
>>> I0108 10:16:52.901736 37278 slave.cpp:2215] Handling status update
>>> TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
>>> ff631ad1-cfab-493e-be18-961581abcf3d of framework
>>> 20171208-050805-140555025-5050-3470- from @0.0.0.0:0
>>>
>>> I0108 10:16:52.901978 37278 slave.cpp:4305] Terminating task
>>> ff631ad1-cfab-493e-be18-961581abcf3d
>>>
>>> W0108 10:16:52.902793 37274 containerizer.cpp:852] Ignoring update for
>>> unknown container: bb0e5f2d-4bdb-479c-b829-4741993c4109
>>>
>>> I0108 10:16:52.903230 37274 status_update_manager.cpp:317] Received
>>> status update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5)
>>> for task ff631ad1-cfab-493e-be18-961581abcf3d of framework
>>> 20171208-050805-140555025-5050-3470-
>>>
>>> I0108 10:16:52.904119 37274 status_update_manager.cpp:371] Forwarding
>>> update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
>>> ff631ad1-cfab-493e-be18-961581abcf3d of framework
>>> 20171208-050805-140555025-5050-3470- to the slave
>>>
>>> I0108 10:16:52.905725 37282 slave.cpp:2458] Forwarding the update
>>> TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
>>> ff631ad1-cfab-493e-be18-961581abcf3d of framework
>>> 20171208-050805-140555025-5050-3470- to master@17.179.96.8:5050
>>>
>>> I0108 10:16:52.906025 37282 slave.cpp:2385] Status update manager
>>> successfully handled status update TASK_LOST (UUID:
>>> f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
>>> ff631ad1-cfab-493e-be18-961581abcf3d of framework
>>> 20171208-050805-140555025-5050-3470-
>>>
>>> I0108 10:16:52.956588 37280 status_update_manager.cpp:389] Received
>>> status update acknowledge

Re: Mesos rare TASK_LOST scenario v 0.21.0

2018-01-09 Thread Vinod Kone
0.21 is really old and not supported. I highly recommend you upgrade to
1.3+.

Regarding what you are seeing, we definitely had issues in the past where
the command executor didn't stay up long enough to guarantee that
TASK_FINISHED was delivered to the agent; so races like above were possible.

On Tue, Jan 9, 2018 at 5:33 PM, Ajay V  wrote:

> Hello,
>
> I'm trying to debug a TASK_LOST thats generated on the agent that I see on
> rare occasions.
>
> Following is a log that I'm trying to understand. This is happening after
> the driver.sendStatusUpdate() has been called with a task state of
> TASK_FINISHED from a java executor. It looks to me like the container is
> already exited before the TASK_FINISHED  is processed. Is there a timing
> issue here in this version of mesos that is causing this? The effect of
> this problem is that, even though the work of the executor is complete and
> the executor calls the sendStatusUpdate with a TASK_FINISHED, the task is
> marked as LOST and the actual update of TASK_FINISHED is ignored.
>
> I0108 10:16:51.388300 37272 containerizer.cpp:1117] Executor for container
> 'bb0e5f2d-4bdb-479c-b829-4741993c4109' has exited
>
> I0108 10:16:51.388741 37272 containerizer.cpp:946] Destroying container
> 'bb0e5f2d-4bdb-479c-b829-4741993c4109'
>
> W0108 10:16:52.159241 37260 posix.hpp:192] No resource usage for unknown
> container 'bb0e5f2d-4bdb-479c-b829-4741993c4109'
>
> W0108 10:16:52.803463 37255 containerizer.cpp:888] Skipping resource
> statistic for container bb0e5f2d-4bdb-479c-b829-4741993c4109 because:
> Failed to get usage: No process found at 28952
>
> I0108 10:16:52.899657 37278 slave.cpp:2898] Executor
> 'ff631ad1-cfab-493e-be18-961581abcf3d' of framework
> 20171208-050805-140555025-5050-3470- exited with status 0
>
> I0108 10:16:52.901736 37278 slave.cpp:2215] Handling status update
> TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
> ff631ad1-cfab-493e-be18-961581abcf3d of framework
> 20171208-050805-140555025-5050-3470- from @0.0.0.0:0
>
> I0108 10:16:52.901978 37278 slave.cpp:4305] Terminating task
> ff631ad1-cfab-493e-be18-961581abcf3d
>
> W0108 10:16:52.902793 37274 containerizer.cpp:852] Ignoring update for
> unknown container: bb0e5f2d-4bdb-479c-b829-4741993c4109
>
> I0108 10:16:52.903230 37274 status_update_manager.cpp:317] Received status
> update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
> ff631ad1-cfab-493e-be18-961581abcf3d of framework
> 20171208-050805-140555025-5050-3470-
>
> I0108 10:16:52.904119 37274 status_update_manager.cpp:371] Forwarding
> update TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
> ff631ad1-cfab-493e-be18-961581abcf3d of framework
> 20171208-050805-140555025-5050-3470- to the slave
>
> I0108 10:16:52.905725 37282 slave.cpp:2458] Forwarding the update
> TASK_LOST (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task
> ff631ad1-cfab-493e-be18-961581abcf3d of framework
> 20171208-050805-140555025-5050-3470- to master@17.179.96.8:5050
>
> I0108 10:16:52.906025 37282 slave.cpp:2385] Status update manager
> successfully handled status update TASK_LOST (UUID: 
> f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5)
> for task ff631ad1-cfab-493e-be18-961581abcf3d of framework
> 20171208-050805-140555025-5050-3470-
>
> I0108 10:16:52.956588 37280 status_update_manager.cpp:389] Received status
> update acknowledgement (UUID: f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for
> task ff631ad1-cfab-493e-be18-961581abcf3d of framework
> 20171208-050805-140555025-5050-3470-
>
> I0108 10:16:52.956841 37280 status_update_manager.cpp:525] Cleaning up
> status update stream for task ff631ad1-cfab-493e-be18-961581abcf3d of
> framework 20171208-050805-140555025-5050-3470-
>
> I0108 10:16:52.957608 37268 slave.cpp:1800] Status update manager
> successfully handled status update acknowledgement (UUID:
> f2bf0aa2-d465-4ced-8cea-06bc1d3f38c5) for task 
> ff631ad1-cfab-493e-be18-961581abcf3d
> of framework 20171208-050805-140555025-5050-3470-
>
> I0108 10:16:52.958693 37268 slave.cpp:4344] Completing task
> ff631ad1-cfab-493e-be18-961581abcf3d
>
> I0108 10:16:52.960364 37268 slave.cpp:3007] Cleaning up executor
> 'ff631ad1-cfab-493e-be18-961581abcf3d' of framework
> 20171208-050805-140555025-5050-3470-
>
> Regards,
> Ajay
>


Re: [Community WG] Cancel Jan 15th meeting for US holiday

2018-01-05 Thread Vinod Kone
SGTM 

@vinodkone

> On Jan 5, 2018, at 4:43 PM, Judith Malnick  wrote:
> 
> Hi everyone,
> 
> I'd like to propose canceling the January 15th Community working group 
> meeting because of Martin Luther King Jr. Day in the US. The next meeting 
> would then be January 29th, 2018. Let me know what you think. 
> 
> Best! 
> Judith
> 
> -- 
> Judith Malnick
> Community Manager
> 310-709-1517


Re: java driver/shutdown call

2018-01-05 Thread Vinod Kone
That's right. It is only available for v1 schedulers.

On Fri, Jan 5, 2018 at 3:38 PM, Mohit Jaggi  wrote:

> Folks,
> I am trying to change Apache Aurora's code to call SHUTDOWN instead of
> KILL. However, it seems that the SchedulerDriver class in Mesos does not
> have a shutdownExecutor() call.
>
> https://github.com/apache/mesos/blob/72752fc6deb8ebcbfbd5448dc599ef
> 3774339d31/src/java/src/org/apache/mesos/SchedulerDriver.java
>
> Mohit.
>


Re: [VOTE] Release Apache Mesos 1.3.2 (rc1)

2017-12-15 Thread Vinod Kone
+1 (binding)

Tested on ASF CI. Red builds are flaky tests due to the known perf core
dump issue that has been fixed since.
*Revision*: 17ab9ff8b35f5ec877f08698e28301bec030d010

   - refs/tags/1.3.2-rc1

Configuration Matrix gcc clang
centos:7 --verbose --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Failed]

cmake
[image: Success]

[image: Success]

--verbose autotools
[image: Failed]

[image: Failed]

cmake
[image: Success]

[image: Success]


On Thu, Dec 14, 2017 at 5:38 PM, Benjamin Mahler  wrote:

> +1 (binding)
>
> make check passes on macOS 10.13.2 with Apple LLVM version 9.0.0
> (clang-900.0.39.2)
>
> On Thu, Dec 7, 2017 at 2:44 PM, Michael Park  wrote:
>
>> Hi all,
>>
>> Please vote on releasing the following candidate as Apache Mesos 1.3.2.
>>
>> The CHANGELOG for the release is available at:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_p
>> lain;f=CHANGELOG;hb=1.3.2-rc1
>> 
>> 
>>
>> The candidate for Mesos 1.3.2 release is available at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.3.2-rc1/mesos-1.3.2.tar.gz
>>
>> The tag to be voted on is 1.3.2-rc1:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.3.2-rc1
>>
>> The MD5 checksum of the tarball can be found at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.3.2-rc1/mesos
>> -1.3.2.tar.gz.md5
>>
>> The signature of the tarball can be found at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.3.2-rc1/mesos
>> -1.3.2.tar.gz.asc
>>
>> The PGP key used to sign the release is here:
>> 

Re: This Month in Mesos - December 2017

2017-12-12 Thread Vinod Kone
Thanks for the update Greg!.

On Tue, Dec 12, 2017 at 3:51 PM, Greg Mann  wrote:

> Dear Apache Mesos Community,
>
> Development in Mesos has been active lately, with work taking place to
> enable things like hybrid cloud and network storage support, as well as
> improvements to the scheduler API designed to make the lives of framework
> developers easier.
>
> Apache Mesos version 1.5 is just around the corner; we hope to cut the
> first release candidate (RC) within the next week, so keep your eyes peeled
> on the mailing lists! As always, your help testing this release during the
> RC phase is greatly appreciated.
>
> We've also scheduled our 2nd quarterly Doc-A-thon on January 11th, hosted
> at Mesosphere HQ in San Francisco! You'll be able to join remotely using
> Zoom, or in person. To see an event description and RSVP please visit the 
> Meetup
> page .
>
>
> Last week I sat down with long-time Mesos committer Jie Yu to discuss the
> storage effort that he’s been leading. Find the interview on the Mesos
> Blog . And if you haven’t yet checked out
> Ben Mahler’s recent performance working group progress report, you can find
> it there as well!
>
> Going forward, we’ll endeavor to bring you monthly updates like this on
> the latest progress in the project. Next Month, we’ll celebrate the new
> release and go into detail on the exciting new features that made it in.
>
> If you have anything you'd like to share in the newsletter (blog posts,
> calls for contribution, announcements) please email me or join the
> community working group, which meets every other week on Monday at 10:30 am
> Pacific time using Zoom . The next meeting
> will be on December 18th.
>
> Best,
>
> Greg
>


Re: Welcome Andrew Schwartzmeyer as a new committer and PMC member!

2017-11-27 Thread Vinod Kone
Congrats Andy! Well deserved!

On Mon, Nov 27, 2017 at 6:54 PM, Alex Evonosky 
wrote:

> Welcome Andy!  Congratulations!
>
> On Mon, Nov 27, 2017 at 9:48 PM, Benjamin Mahler 
> wrote:
>
>> Welcome and thanks for your contributions so far!
>>
>> On Mon, Nov 27, 2017 at 11:00 PM, Joseph Wu  wrote:
>>
>>> Hi devs & users,
>>>
>>> I'm happy to announce that Andrew Schwartzmeyer has become a new
>>> committer and member of the PMC for the Apache Mesos project.  Please join
>>> me in congratulating him!
>>>
>>> Andrew has been an active contributor to Mesos for about a year.  He has
>>> been the primary contributor behind our efforts to change our default build
>>> system to CMake and to port Mesos onto Windows.
>>>
>>> Here is his committer candidate checklist for your perusal:
>>> https://docs.google.com/document/d/1MfJRYbxxoX2-A-g8NEeryUdU
>>> i7FvIoNcdUbDbGguH1c/
>>>
>>> Congrats Andy!
>>> ~Joseph
>>>
>>
>>
>


Re: Is it possible to configure a mesos agent to use multiple work directories?

2017-11-22 Thread Vinod Kone
I see. One option would be to expose multiple disks as resources to
frameworks and have them use that. The task sandboxes (and other metadata)
will still be located in `work_dir`, but most of the tasks' I/O could be
directed towards those disks. Of course, this needs changes to frameworks
which is not ideal.

On Wed, Nov 22, 2017 at 1:57 PM, Jeff Kubina <jeff.kub...@gmail.com> wrote:

> Thanks, that is what I thought.
>
> Why: To spread the I/O-workload of some frameworks across many disks.
>
> --
> Jeff Kubina
> 410-988-4436 <(410)%20988-4436>
>
>
> On Wed, Nov 22, 2017 at 2:21 PM, Vinod Kone <vinodk...@apache.org> wrote:
>
>> No. Why do you need that?
>>
>> On Wed, Nov 22, 2017 at 10:42 AM, Jeff Kubina <jeff.kub...@gmail.com>
>> wrote:
>>
>>> Is it possible to configure a mesos agent to use multiple work
>>> directories (the work_dir parameter)?
>>>
>>>
>>
>


Re: Is it possible to configure a mesos agent to use multiple work directories?

2017-11-22 Thread Vinod Kone
No. Why do you need that?

On Wed, Nov 22, 2017 at 10:42 AM, Jeff Kubina  wrote:

> Is it possible to configure a mesos agent to use multiple work directories
> (the work_dir parameter)?
>
>


Re: Can't really understand how do Executors can be injected...

2017-11-22 Thread Vinod Kone
If you have an executor running on a agent, wait for an offer from *that
agent* and launch a new task with the *same* ExecutorInfo as the one you
originally used to launch the executor. In this case, mesos will not launch
a new executor but passes the task to the already running executor. Note
that, if for some reason your original executor died just as you were
launching the new task, mesos will launch a new instance of that executor
and pass the new task. So your executor needs to handle this race.

On Wed, Nov 22, 2017 at 10:45 AM, Alex Kotelnikov <
alex.kotelni...@diginetica.com> wrote:

> Vihod,
>
> much more clear. Thanks.
>
> I refined first question inline.
>
> On 22 November 2017 at 21:15, Vinod Kone <vinodk...@apache.org> wrote:
>
>> Hi Alex,
>>
>> See my answers below
>>
>> 1. Launch a task without accepting an offer (on already existing
>>> executor).
>>>
>>
>> This is not currently possible. Every task needs some non-zero resources,
>> and hence an offer, to be launched. What's your use case?
>>
>
> Basically if I have an executor running, how to launch a task on it?
>
>
> --
>
> Best Regards,
>
>
> *Alexander Kotelnikov*
>
> *Team Lead*
>
> DIGINETICA
> Retail Technology Company
>
> m: +7.921.915.06.28 <+7%20921%20915-06-28>
>
> *www.diginetica.com <http://www.diginetica.com/>*
>


Re: Can't really understand how do Executors can be injected...

2017-11-22 Thread Vinod Kone
Hi Alex,

See my answers below

1. Launch a task without accepting an offer (on already existing executor).
>

This is not currently possible. Every task needs some non-zero resources,
and hence an offer, to be launched. What's your use case?


> 2. Initiate an executor with no tasks (to launch them later).
>

The only way to do it is to a launch an executor with dummy task. If you
are writing your own executor, you can interpret the dummy task as you wish
(e.g., no-op). There is no API to launch an executor without a task yet.


> 3. How actually to introduce your own executor with V1 protocol.
>

See the executor API doc
 on how
to write one. Once your executor binary exists, upload it in a storage
location somewhere (hdfs, http server etc) and pass it's location as URI in
ExecutorInfo. See example here

.

HTH,


>
> Could you please point me at line in the API, specification, code,
> whatever.
>
> Thanks a lot,
>
>
> --
>
> Best Regards,
>
>
> *Alexander Kotelnikov*
>
> *Team Lead*
>
> DIGINETICA
> Retail Technology Company
>
> m: +7.921.915.06.28 <+7%20921%20915-06-28>
>
> *www.diginetica.com *
>


Re: [VOTE] Release Apache Mesos 1.2.3 (rc1)

2017-11-21 Thread Vinod Kone
+1 (binding)

Tested on ASF CI. The failures are due to 2 issues 1) perf core dump
 which was fixed in 1.5.0
and 2) flaky oversubscription test
 also fixed in 1.5.0.

*Revision*: 7559c9352c78912526820f6222ed2b17ad3b19cf

   - refs/tags/1.2.3-rc1

Configuration Matrix gcc clang
centos:7 --verbose --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Failed]

cmake
[image: Failed]

[image: Success]

--verbose autotools
[image: Success]

[image: Failed]

cmake
[image: Success]

[image: Success]


On Wed, Nov 15, 2017 at 9:57 PM, Adam Bordelon  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.2.3.
> 1.2.3 is our last scheduled bug fix release in the 1.2.x branch.
>
> The CHANGELOG for the release is available at:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_
> plain;f=CHANGELOG;hb=1.2.3-rc1
> 
> 
>
> The candidate for Mesos 1.2.3 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/1.2.3-rc1/mesos-1.2.3.tar.gz
>
> The tag to be voted on is 1.2.3-rc1:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.2.3-rc1
>
> The MD5 checksum of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.2.3-rc1/
> mesos-1.2.3.tar.gz.md5
>
> The signature of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.2.3-rc1/
> mesos-1.2.3.tar.gz.asc
>
> The PGP key used to sign the release is here:
> 

Re: Quarterly Doc-a-thon Scheduling

2017-11-20 Thread Vinod Kone
SGTM!

On Mon, Nov 20, 2017 at 10:13 AM, Judith Malnick 
wrote:

> Hi all,
>
> Since the last doc-a-thon was so great, I'd love to get another on the
> calendar for next quarter.
>
> The last one was on the 2nd Thursday of October, so It might make sense to
> have the next one on the 2nd Thursday of January. (*January 11th,
> 3-8pm Pacific time*).
>
> Does this date sound good to you? If I get many nos I'll follow up with a
> doodle poll.
>
> All the best!
> Judith
>
> --
> Judith Malnick
> Community Manager
> 310-709-1517 <(310)%20709-1517>
>


Re: Adding a new agent terminates existing executors?

2017-11-15 Thread Vinod Kone
Yes, there are a bunch of flags that need to be different. There are likely
some isolators which will not work correctly when you have multiple agents
on the same host even then. The garbage collector assumes it has sole
access to the disk containing work dir etc etc.

In general, running multiple agents on the same host is not tested and is
not recommended at all for production. For testing purposes, I would
recommend putting agents on different VMs.

On Wed, Nov 15, 2017 at 11:58 AM, Dan Leary <d...@touchplan.io> wrote:

> Bingo.
> It probably doesn't hurt to differentiate --runtime_dir per agent but the
> real problem is that --cgroups_root needs to be different too.
> As one might infer from linux_launcher.cpp:
>
> Future<hashset> LinuxLauncherProcess::recover(
>> const list& states)
>> {
>>   // Recover all of the "containers" we know about based on the
>>   // existing cgroups.
>>   Try<vector> cgroups =
>> cgroups::get(freezerHierarchy, flags.cgroups_root);
>
>
> Thanks much.
>
> On Wed, Nov 15, 2017 at 11:37 AM, James Peach <jor...@gmail.com> wrote:
>
>>
>> > On Nov 15, 2017, at 8:24 AM, Dan Leary <d...@touchplan.io> wrote:
>> >
>> > Yes, as I said at the outset, the agents are on the same host, with
>> different ip's and hostname's and work_dir's.
>> > If having separate work_dirs is not sufficient to keep containers
>> separated by agent, what additionally is required?
>>
>> You might also need to specify other separate agent directories, like
>> --runtime_dir, --docker_volume_checkpoint_dir, etc. Check the output of
>> mesos-agent --flags.
>>
>> >
>> >
>> > On Wed, Nov 15, 2017 at 11:13 AM, Vinod Kone <vinodk...@apache.org>
>> wrote:
>> > How is agent2 able to see agent1's containers? Are they running on the
>> same box!? Are they somehow sharing the filesystem? If yes, that's not
>> supported.
>> >
>>
>>
>


Re: Adding a new agent terminates existing executors?

2017-11-15 Thread Vinod Kone
How is agent2 able to see agent1's containers? Are they running on the same
box!? Are they somehow sharing the filesystem? If yes, that's not supported.

On Wed, Nov 15, 2017 at 8:07 AM, Dan Leary <d...@touchplan.io> wrote:

> Sure, master log and agent logs are attached.
>
> Synopsis:  In the master log, tasks t01 and t02 are running...
>
> > I1114 17:08:15.972033  5443 master.cpp:6841] Status update TASK_RUNNING
> (UUID: 9686a6b8-b04d-4bc5-9d26-32d50c7b0f74) for task t01 of
> framework 10aa0208-4a85-466c-af89-7e73617516f5-0001 from agent
> 10aa0208-4a85-466c-af89-7e73617516f5-S0 at slave(1)@127.1.1.1:5051
> (agent1)
> > I1114 17:08:19.142276  5448 master.cpp:6841] Status update TASK_RUNNING
> (UUID: a6c72f31-2e47-4003-b707-9e8c4fb24f05) for task t02 of
> framework 10aa0208-4a85-466c-af89-7e73617516f5-0001 from agent
> 10aa0208-4a85-466c-af89-7e73617516f5-S0 at slave(1)@127.1.1.1:5051
> (agent1)
>
> Operator starts up agent2 around 17:08:50ish.  Executor1 and its tasks are
> terminated
>
> > I1114 17:08:54.835841  5447 master.cpp:6964] Executor 'executor1' of
> framework 10aa0208-4a85-466c-af89-7e73617516f5-0001 on agent
> 10aa0208-4a85-466c-af89-7e73617516f5-S0 at slave(1)@127.1.1.1:5051
> (agent1): terminated with signal Killed
> > I1114 17:08:54.835959  5447 master.cpp:9051] Removing executor
> 'executor1' with resources [] of framework 
> 10aa0208-4a85-466c-af89-7e73617516f5-0001
> on agent 10aa0208-4a85-466c-af89-7e73617516f5-S0 at slave(1)@
> 127.1.1.1:5051 (agent1)
> > I1114 17:08:54.837419  5436 master.cpp:6841] Status update TASK_FAILED
> (UUID: d6697064-6639-4d50-b88e-65b3eead182d) for task t01 of
> framework 10aa0208-4a85-466c-af89-7e73617516f5-0001 from agent
> 10aa0208-4a85-466c-af89-7e73617516f5-S0 at slave(1)@127.1.1.1:5051
> (agent1)
> > I1114 17:08:54.837497  5436 master.cpp:6903] Forwarding status update
> TASK_FAILED (UUID: d6697064-6639-4d50-b88e-65b3eead182d) for task t01
> of framework 10aa0208-4a85-466c-af89-7e73617516f5-0001
> > I1114 17:08:54.837896  5436 master.cpp:8928] Updating the state of task
> t01 of framework 10aa0208-4a85-466c-af89-7e73617516f5-0001 (latest
> state: TASK_FAILED, status update state: TASK_FAILED)
> > I1114 17:08:54.839159  5436 master.cpp:6841] Status update TASK_FAILED
> (UUID: 7e7f2078-3455-468b-9529-23aa14f7a7e0) for task t02 of
> framework 10aa0208-4a85-466c-af89-7e73617516f5-0001 from agent
> 10aa0208-4a85-466c-af89-7e73617516f5-S0 at slave(1)@127.1.1.1:5051
> (agent1)
> > I1114 17:08:54.839221  5436 master.cpp:6903] Forwarding status update
> TASK_FAILED (UUID: 7e7f2078-3455-468b-9529-23aa14f7a7e0) for task t02
> of framework 10aa0208-4a85-466c-af89-7e73617516f5-0001
> > I1114 17:08:54.839493  5436 master.cpp:8928] Updating the state of task
> t02 of framework 10aa0208-4a85-466c-af89-7e73617516f5-0001 (latest
> state: TASK_FAILED, status update state: TASK_FAILED)
>
> But agent2 doesn't register until later...
>
> > I1114 17:08:55.588762  5442 master.cpp:5714] Received register agent
> message from slave(1)@127.1.1.2:5052 (agent2)
>
> Meanwhile in the agent1 log, the termination of executor1 appears to be
> the result of the destruction of its container...
>
> > I1114 17:08:54.810638  5468 containerizer.cpp:2612] Container
> cbcf6992-3094-4d0f-8482-4d68f68eae84 has exited
> > I1114 17:08:54.810732  5468 containerizer.cpp:2166] Destroying container
> cbcf6992-3094-4d0f-8482-4d68f68eae84 in RUNNING state
> > I1114 17:08:54.810761  5468 containerizer.cpp:2712] Transitioning the
> state of container cbcf6992-3094-4d0f-8482-4d68f68eae84 from RUNNING to
> DESTROYING
>
> Apparently because agent2 decided to "recover" the very same container...
>
> > I1114 17:08:54.775907  6041 linux_launcher.cpp:373]
> cbcf6992-3094-4d0f-8482-4d68f68eae84 is a known orphaned container
> > I1114 17:08:54.779634  6037 containerizer.cpp:966] Cleaning up orphan
> container cbcf6992-3094-4d0f-8482-4d68f68eae84
> > I1114 17:08:54.779705  6037 containerizer.cpp:2166] Destroying container
> cbcf6992-3094-4d0f-8482-4d68f68eae84 in RUNNING state
> > I1114 17:08:54.779737  6037 containerizer.cpp:2712] Transitioning the
> state of container cbcf6992-3094-4d0f-8482-4d68f68eae84 from RUNNING to
> DESTROYING
> > I1114 17:08:54.780740  6041 linux_launcher.cpp:505] Asked to destroy
> container cbcf6992-3094-4d0f-8482-4d68f68eae84
>
> Seems like an issue with the containerizer?
>
>
> On Tue, Nov 14, 2017 at 4:46 PM, Vinod Kone <vinodk...@apache.org> wrote:
>
>> That seems weird then. A new agent coming up on a new ip and host,
>> shouldn't affect other agents running on different hosts. Can you share
>>

Re: Adding a new agent terminates existing executors?

2017-11-14 Thread Vinod Kone
That seems weird then. A new agent coming up on a new ip and host,
shouldn't affect other agents running on different hosts. Can you share
master logs that surface the issue?

On Tue, Nov 14, 2017 at 12:51 PM, Dan Leary <d...@touchplan.io> wrote:

> Just one mesos-master (no zookeeper) with --ip=127.0.0.1
> --hostname=localhost.
> In /etc/hosts are
>   127.1.1.1agent1
>   127.1.1.2agent2
> etc. and mesos-agent gets passed --ip=127.1.1.1 --hostname=agent1 etc.
>
>
> On Tue, Nov 14, 2017 at 3:41 PM, Vinod Kone <vinodk...@apache.org> wrote:
>
>> ```Experiments thus far are with a cluster all on a single host, master
>> on 127.0.0.1, agents have their own ip's and hostnames and ports.```
>>
>> What does this mean? How are all your masters and agents on the same host
>> but still get different ips and hostnames?
>>
>>
>> On Tue, Nov 14, 2017 at 12:22 PM, Dan Leary <d...@touchplan.io> wrote:
>>
>>> So I have a bespoke framework that runs under 1.4.0 using the v1 HTTP
>>> API, custom executor, checkpointing disabled.
>>> When the framework is running happily and a new agent is added to the
>>> cluster all the existing executors immediately get terminated.
>>> The scheduler is told of the lost executors and tasks and then receives
>>> offers about agents old and new and carries on normally.
>>>
>>> I would expect however that the existing executors should keep running
>>> and the scheduler should just receive offers about the new agent.
>>> It's as if agent recovery is being performed when the new agent is
>>> launched even though no old agent has exited.
>>> Experiments thus far are with a cluster all on a single host, master on
>>> 127.0.0.1, agents have their own ip's and hostnames and ports.
>>>
>>> Am I missing a configuration parameter?   Or is this correct behavior?
>>>
>>> -Dan
>>>
>>>
>>
>


Re: [VOTE] Release Apache Mesos 1.4.1 (rc1)

2017-11-13 Thread Vinod Kone
+1 (binding)

Tested on ASF CI. Couple red failure builds were due to known flaky tests,
one of which is already resolved on master.


*Revision*: c844db9ac7c0cef59be87438c6781bfb71adcc42

   - refs/tags/1.4.1-rc1

Configuration Matrix gcc clang
centos:7 --verbose --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose autotools
[image: Success]

[image: Failed]

cmake
[image: Success]

[image: Success]


On Thu, Nov 9, 2017 at 6:27 PM, Kapil Arya  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.4.1.
>
> 1.4.1 includes the following:
> 
> 
> * [MESOS-7873] - Expose `ExecutorInfo.ContainerInfo.NetworkInfo` in Mesos
> `state` endpoint.
> * [MESOS-7921] - ProcessManager::resume sometimes crashes accessing
> EventQueue.
> * [MESOS-7964] - Heavy-duty GC makes the agent unresponsive.
>
> * [MESOS-7968] - Handle `/proc/self/ns/pid_for_children` when parsing
> available namespace.
> * [MESOS-7969] - Handle cgroups v2 hierarchy when parsing
> /proc/self/cgroups.
> * [MESOS-7980] - Stout fails to compile with libc >= 2.26.
>
> * [MESOS-8051] - Killing TASK_GROUP fail to kill some tasks.
>
> * [MESOS-8080] - The default executor does not propagate missing task exit
> status correctly.
> * [MESOS-8090] - Mesos 1.4.0 crashes with 1.3.x agent with oversubscription
>
> * [MESOS-8135] - Masters can lose track of tasks' executor IDs.
>
> * [MESOS-8169] - Incorrect master validation forces executor IDs to be
> globally unique.
>
>
> The 

Re: Binary RPM packages for CentOS

2017-11-03 Thread Vinod Kone
This is great to see! Thanks for driving this Kapil.

Once the official packages are vetted, we should update our website to
point users to it. For new users, getting started with binary packages is
infinitely better than having to build from source.

Folks, please help us by testing this packages and provide feedback.

On Fri, Nov 3, 2017 at 10:32 AM, Kapil Arya  wrote:

> Hi All,
>
> We have some updates regarding the binary RPM build/release process for
> Mesos.
>
> Here are some important bullet points:
>
> * Jira: https://issues.apache.org/jira/browse/MESOS-7981
> * The RPM packages are now built using a jenkins job [1] on the Apache CI.
> * The build scripts are part of the Mesos source tree (support/packaging).
> * Packages are distributed using bintray [2].
> * Currently, we are using the mesos org under bintray. Once we have more
>   confidence in the packages, we'll move under the apache org to benefit
> from
>   the superior resource limits.
>
> Here are the steps to install the CentOS 7 packages from bintray:
>
> *$* cat > /tmp/bintray-mesos-el.repo < #bintray-mesos-el - packages by mesos from Bintray
> [bintray-mesos-el]
> name=bintray-mesos-el
> baseurl=https://dl.bintray.com/mesos/el/7/x86_64
> gpgcheck=0
> repo_gpgcheck=0
> enabled=1
> EOF
>
> *$* sudo mv /tmp/bintray-mesos-el.repo
> /etc/yum.repos.d/bintray-mesos-el.repo
>
> *$* sudo yum update
>
> *$* sudo yum install mesos
>
>
> To start, we have released 1.4.0 packages in the hope to get some feedback
> from
> the community before releasing rpms for other supported version.
>
> Future work:
> * Release RC packages in bintray.com/mesos/el-testing repo.
> * Release (rotating) nightly builds in bintray.com/mesos/el-unstable repo.
> * Create/release debian/ubuntu packages with the help from community
> members who
>   are already publishing such packages.
>
> Best,
> Kapil
>
> [1]: https://builds.apache.org/job/Mesos/job/Packaging/job/CentosRPMs
> [2]: https://bintray.com/mesos/
>


Re: Subscribe to an active framework through HTTP API Scheduler

2017-10-31 Thread Vinod Kone
Someone worked on it (new operator api call) during the Mesoscon EU
hackathon. They were planning to send a review once they wrap it up.

On Tue, Oct 31, 2017 at 12:11 PM, Benjamin Mahler 
wrote:

> There had been discussion to support killing tasks from the operator API,
> but it's not in place yet.
>
> In the interim, you can manually kill them from the host.
>
> On Thu, Oct 26, 2017 at 11:24 AM, Zhitao Li  wrote:
>
>> Each active framework on HTTP scheduler API is allocated with a stream
>> id. This is included in the header "Mesos-Stream-Id" in initial subscribed
>> response.
>>
>> If you obtain this stream id, another process can "impersonate" this
>> framework to submit kill requests (using framework id, task id and the same
>> credential if auth is enabled).
>>
>> I cannot find a good place in Mesos master which logs this stream id, so
>> probably report it in your framework side?
>>
>> Hope this is helpful as we have done similar things recently.
>>
>>
>>
>> On Thu, Oct 26, 2017 at 2:10 AM, Manuel Montesino <
>> manuel.montes...@piksel.com> wrote:
>>
>>> Hi,
>>>
>>>
>>> We have a framework with some tasks that we would like to kill but not
>>> all framework (teardown), so we would like to use the kill method of the
>>> http api scheduler, the problem is that is needed to be suscribed. Creating
>>> a new framework in stream mode and executing the kill method it's not
>>> allowed to kill task for another framework (403 Forbidden).
>>>
>>>
>>> So, is possible to subscribe/connect to an existing framework, or other
>>> mode to operate into it from the http api?.
>>>
>>>
>>> Thanks in advance.
>>>
>>>
>>> *Manuel Montesino*
>>> Devops Engineer
>>>
>>> *E* *manuel.montesino@piksel(dot)com*
>>>
>>> Marie Curie,1. Ground Floor. Campanillas, Malaga 29590
>>> *liberating viewing* | *piksel.com *
>>>
>>> [image: Piksel_Email.png]
>>>
>>> This message is private and confidential. If you have received this
>>> message in error, please notify the sender or serviced...@piksel.com
>>> and remove it from your system.
>>>
>>> Piksel Inc is a company registered in the United States, 2100 Powers
>>> Ferry Road SE, Suite 400, Atlanta, GA 30339
>>> 
>>>
>>
>>
>>
>> --
>> Cheers,
>>
>> Zhitao Li
>>
>
>


  1   2   3   4   5   6   >