Re: Welcome Meng Zhu as PMC member and committer!

2018-11-12 Thread Zhitao Li
Congratulations!

On Sun, Nov 11, 2018, 7:26 PM Jason Lai  Congrats, Meng!
>
> On Mon, Nov 5, 2018 at 9:54 AM Benno Evers  wrote:
>
>> Congratulations, Meng!
>>
>> On Thu, Nov 1, 2018 at 9:54 AM Yan Xu  wrote:
>>
>>> Congratulations!
>>>
>>>
>>> On Wed, Oct 31, 2018 at 4:50 PM Vinod Kone  wrote:
>>>
 Congrats Meng!

 Thanks,
 Vinod

 > On Oct 31, 2018, at 4:26 PM, Gilbert Song 
 wrote:
 >
 > Well deserved, Meng!
 >
 >> On Wed, Oct 31, 2018 at 2:36 PM Benjamin Mahler 
 wrote:
 >> Please join me in welcoming Meng Zhu as a PMC member and committer!
 >>
 >> Meng has been active in the project for almost a year and has been
 very productive and collaborative. He is now one of the few people of
 understands the allocator code well, as well as the roadmap for this area
 of the project. He has also found and fixed bugs, and helped users in 
 slack.
 >>
 >> Thanks for all your work so far Meng, I'm looking forward to more of
 your contributions in the project.
 >>
 >> Ben

>>>
>>
>> --
>> Benno Evers
>> Software Engineer, Mesosphere
>>
>


Re: Operations Working Group - First Meeting

2018-07-20 Thread Zhitao Li
Please count me on. Looking forward to it.

Sent from my iPhone

> On Jul 20, 2018, at 4:05 PM, Gastón Kleiman  wrote:
> 
> Hi Abel,
> 
> I would love to learn more from people operating Mesos clusters of any
> size. We can discuss what is working great, what is on the roadmap, and
> what could be improved.
> 
> Some of us have been working on adding new per-framework metrics and extra
> logging to the Mesos master - I think that an operator would find them
> valuable to monitor/debug/troubleshoot a Mesos cluster, so I could also
> talk a bit about that.
> 
> The agenda (
> https://docs.google.com/document/d/1XjJfoksz70vbTvvT6FQ1t_J0SD1XIoipmYSvEHJfXt8/edit
> ) is still open and editable. I want to encourage everyone to add there the
> topics that interest you!
> 
> Looking forward to meeting you all over Zoom next week,
> 
> -Gastón
> 
>> On Tue, Jul 17, 2018 at 2:55 AM Abel Souza  wrote:
>> 
>> Thank you for setting this up Gaston,
>> 
>> Would you mind giving us a brief of what you have in mind for discussion?
>> 
>> Thank you,
>> 
>> Abel
>> 
>> On 07/17/2018 10:52 AM, Matt Jarvis wrote:
>> 
>> That's great news Gaston ! Let me know if you need any help from the
>> Community team.
>> 
>> Matt
>> 
>>> On Tue, 17 Jul 2018, 05:04 Gastón Kleiman,  wrote:
>>> 
>>> Hi all,
>>> 
>>> Thank you for responding to my previous emails - I think that we have
>>> quorum for a new working group!
>>> 
>>> Many of you who have expressed interest seem to be in Europe, so I tried
>>> schedule the first meeting at a time that I hope will be friendly for
>>> people in both GMT+1 and GMT-8:
>>> 
>>> *Date:* Wednesday July 25th from 9:00-10:00 AM PDT
>>> *Agenda:*
>>> https://docs.google.com/document/d/1XjJfoksz70vbTvvT6FQ1t_J0SD1XIoipmYSvEHJfXt8/
>>> *Zoom URL:* https://zoom.us/j/310132146
>>> 
>>> 
>>> You can also find the event in the Mesos Community Calendar.
>>> 
>>> Feel free to add more topics to the agenda.
>>> 
>>> Looking forward to meeting you all next week,
>>> 
>>> -Gastón
>>> 
>> 
>> 


Re: [VOTE] Move the project repos to gitbox

2018-07-17 Thread Zhitao Li
+1

On Tue, Jul 17, 2018 at 8:10 AM James Peach  wrote:

>
>
> > On Jul 17, 2018, at 7:58 AM, Vinod Kone  wrote:
> >
> > Hi,
> >
> > As discussed in another thread and in the committers sync, there seem to
> be heavy interest in moving our project repos ("mesos", "mesos-site") from
> the "git-wip" git server to the new "gitbox" server to better avail GitHub
> integrations.
> >
> > Please vote +1, 0, -1 regarding the move to gitbox. The vote will close
> in 3 business days.
>
>
> +1



-- 
Cheers,

Zhitao Li


Re: [VOTE] Release Apache Mesos 1.6.1 (rc1)

2018-07-03 Thread Zhitao Li
Fixes have been committed and back ported to 1.6.x branch. Please feel free
to cut next RC at your convenience.

Thanks!

On Tue, Jul 3, 2018 at 3:37 PM Greg Mann  wrote:

> Hey folks, an update on the 1.6.1-rc2 candidate: an issue surfaced after
> the fix was merged for MESOS-8830, which is being addressed currently. I'll
> be AFK for the next 3 days, so I'll cut 1.6.1-rc2 this coming Monday. Sorry
> for the delay!
>
> Cheers,
> Greg
>
> On Mon, Jul 2, 2018 at 12:30 PM, Greg Mann  wrote:
>
>> Thanks for voting! Since a -1 vote was cast, I'll be cutting another
>> release candidate shortly. Keep your eyes peeled for the email!
>>
>> Cheers,
>> Greg
>>
>> On Fri, Jun 29, 2018 at 12:03 PM, Chun-Hung Hsiao 
>> wrote:
>>
>>> -1 on https://issues.apache.org/jira/browse/MESOS-8830.
>>>
>>> This is a critical bug that would wipe out persistent data. I'm
>>> backporting
>>> this to 1.4, 1.5 and 1.6.
>>>
>>> On Fri, Jun 29, 2018 at 9:05 AM Greg Mann  wrote:
>>>
>>> > The failures here are mostly command executor/default executor tests.
>>> > Looking at the test output, it seems that the tasks in these tests
>>> failed
>>> > to start successfully and send task status updates. I haven't seen this
>>> > issue on our internal CI; I'll try to re-run the build on ASF CI and
>>> if the
>>> > failures occur again, investigate why that environment is experiencing
>>> this
>>> > problem.
>>> >
>>> > -Greg
>>> >
>>> > On Wed, Jun 27, 2018 at 1:58 PM, Vinod Kone 
>>> wrote:
>>> >
>>> >> Hmm. Lot of tests failed when I ran this through ASF CI. Not sure if
>>> all
>>> >> of these are known flaky tests?
>>> >>
>>> >>
>>> >>
>>> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/50/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/console
>>> >>
>>> >>
>>> >>
>>> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/50/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/console
>>> >>
>>> >> On Wed, Jun 27, 2018 at 11:59 AM Jie Yu  wrote:
>>> >>
>>> >>> +1
>>> >>>
>>> >>> Passed on our internal CI that has the following matrix. I looked
>>> into
>>> >>> the only failed test, looks to be a flaky test due to a race in the
>>> test.
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Tue, Jun 26, 2018 at 7:02 PM, Greg Mann 
>>> wrote:
>>> >>>
>>> >>>> Hi all,
>>> >>>>
>>> >>>> Please vote on releasing the following candidate as Apache Mesos
>>> 1.6.1.
>>> >>>>
>>> >>>>
>>> >>>> 1.6.1 includes the following:
>>> >>>>
>>> >>>>
>>> 
>>> >>>> *Announce major features here*
>>> >>>> *Announce major bug fixes here*
>>> >>>>
>>> >>>> The CHANGELOG for the release is available at:
>>> >>>>
>>> >>>>
>>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.6.1-rc1
>>> >>>>
>>> >>>>
>>> 
>>> >>>>
>>> >>>> The candidate for Mesos 1.6.1 release is available at:
>>> >>>>
>>> >>>>
>>> https://dist.apache.org/repos/dist/dev/mesos/1.6.1-rc1/mesos-1.6.1.tar.gz
>>> >>>>
>>> >>>> The tag to be voted on is 1.6.1-rc1:
>>> >>>>
>>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.6.1-rc1
>>> >>>>
>>> >>>> The SHA512 checksum of the tarball can be found at:
>>> >>>>
>>> >>>>
>>> https://dist.apache.org/repos/dist/dev/mesos/1.6.1-rc1/mesos-1.6.1.tar.gz.sha512
>>> >>>>
>>> >>>> The signature of the tarball can be found at:
>>> >>>>
>>> >>>>
>>> https://dist.apache.org/repos/dist/dev/mesos/1.6.1-rc1/mesos-1.6.1.tar.gz.asc
>>> >>>>
>>> >>>> The PGP key used to sign the release is here:
>>> >>>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>> >>>>
>>> >>>> The JAR is in a staging repository here:
>>> >>>>
>>> https://repository.apache.org/content/repositories/orgapachemesos-1229
>>> >>>>
>>> >>>> Please vote on releasing this package as Apache Mesos 1.6.1!
>>> >>>>
>>> >>>> The vote is open until Fri Jun 29 18:46:28 PDT 2018 and passes if a
>>> >>>> majority of at least 3 +1 PMC votes are cast.
>>> >>>>
>>> >>>> [ ] +1 Release this package as Apache Mesos 1.6.1
>>> >>>> [ ] -1 Do not release this package because ...
>>> >>>>
>>> >>>> Thanks,
>>> >>>> Greg
>>> >>>>
>>> >>>
>>> >>>
>>> >
>>>
>>
>>
>

-- 
Cheers,

Zhitao Li


Support image and resource pre-fetching in Mesos

2018-06-20 Thread Zhitao Li
Hi,

We have been working on optimizing container launch latency in our Mesos
based stack, and one of the optimization we are considering is to pre-fetch
docker image and any necessary resources for the task/executor.

This is especially useful in "updating" of containers of long running
services.

Before delving into detailed proposal, I wonder if anyone has done similar
things or has similar requirements.

Thanks!

-- 
Cheers,

Zhitao Li


Re: narrowing task sandbox permissions

2018-06-15 Thread Zhitao Li
Adding James directly.

On Fri, Jun 15, 2018 at 11:06 AM Zhitao Li  wrote:

> Sorry for getting back to this really late, but we got bit by this
> behavior change in our environment.
>
> The broken scenario we had:
>
>1. We are using Aurora to launch docker containerizer based tasks on
>Mesos;
>2. Most of our docker containers had some legacy behavior: *the
>execution entered as "root" in the entry point script,* setup a couple
>of symlinks and other preparation work, then *"de-escalate" into a non
>privileged user (i.e, "user")*;
>   1. This was added so that the entry point script has enough
>   permission to reconfigure certain side car processes (i.e, nginx) and
>   filesystem paths;
>3. unfortunately, the "user" user will lose access to the sandbox
>after this change.
>
>
> While I'd acknowledge that above behavior is legacy and a piece of major
> tech debt, cleaning it up for the thousands of applications on our platform
> was never easy. Given that our org has other useful features available in
> 1.6, I would propose a couple of options:
>
>1. making the sandbox permission bits configurable
>   1. Certain framework knows that their tasks do not leave sensitive
>   data on sandbox so we could provide this flexibility (it's very useful 
> in
>   practice for migration to a container based system);
>   2. Alternatively, making this possible to reconfigure on agent
>   flags: This could be more secure and easier to manage, but lacks
>   flexibility of allowing different frameworks to do different things.
>2. Until the customization is in place, consider a revert of the
>permission bit change so we preserve the original behavior.
>
> Thanks.
>
> Zhitao and Jason
>


-- 
Cheers,

Zhitao Li


Re: narrowing task sandbox permissions

2018-06-15 Thread Zhitao Li
Sorry for getting back to this really late, but we got bit by this behavior
change in our environment.

The broken scenario we had:

   1. We are using Aurora to launch docker containerizer based tasks on
   Mesos;
   2. Most of our docker containers had some legacy behavior: *the
   execution entered as "root" in the entry point script,* setup a couple
   of symlinks and other preparation work, then *"de-escalate" into a non
   privileged user (i.e, "user")*;
  1. This was added so that the entry point script has enough
  permission to reconfigure certain side car processes (i.e, nginx) and
  filesystem paths;
   3. unfortunately, the "user" user will lose access to the sandbox after
   this change.


While I'd acknowledge that above behavior is legacy and a piece of major
tech debt, cleaning it up for the thousands of applications on our platform
was never easy. Given that our org has other useful features available in
1.6, I would propose a couple of options:

   1. making the sandbox permission bits configurable
  1. Certain framework knows that their tasks do not leave sensitive
  data on sandbox so we could provide this flexibility (it's very useful in
  practice for migration to a container based system);
  2. Alternatively, making this possible to reconfigure on agent flags:
  This could be more secure and easier to manage, but lacks flexibility of
  allowing different frameworks to do different things.
   2. Until the customization is in place, consider a revert of the
   permission bit change so we preserve the original behavior.

Thanks.

Zhitao and Jason


Re: Questions about secret handling in Mesos

2018-05-11 Thread Zhitao Li
Hi Vinod,

I filed a task https://issues.apache.org/jira/browse/MESOS-8909 for this.
If we can agree that this is something worth pursing, I'll try to post some
ideas on whether there is an efficient way to do it.

On Thu, Apr 26, 2018 at 3:32 PM, Vinod Kone  wrote:

> We do direct protobuf to JSON conversion for our API endpoints and I don't
> think we do any special case logic for `Secret` type in that conversion. So
> `value` based secrets will have their value show up in v1 (and likely v0)
> API endpoints.
>
> On Mon, Apr 23, 2018 at 9:25 AM, Zhitao Li  wrote:
>
>> Hi Alexander,
>>
>> We discovered that in our own testing thus do not plan to use the
>> environment variable. For the `volume/secret` case, I believe it's possible
>> to be careful enough so we do not log that, so it's more about whether we
>> want to promise that.
>>
>> What do you think?
>>
>> On Mon, Apr 23, 2018 at 5:13 AM, Alexander Rojas > > wrote:
>>
>>>
>>> Hey Zhitao,
>>>
>>> I sadly have to tell you that the first assumption is not correct. If
>>> you use environment based secrets, docker and verbose mode, they will get
>>> printed (see this patch https://reviews.apache.org/r/57846/). The
>>> reason is that the docker command will get logged and it might contain your
>>> secrets. You may end up with some logging line like:
>>>
>>> ```
>>> I0129 14:09:22.444318 docker.cpp:1139] Running docker -H
>>> unix:///var/run/docker.suck run --cpu-shares 25 --memory 278435456 -e
>>> ADMIN_PASSWORD=test_password …
>>> ```
>>>
>>>
>>> On 19. Apr 2018, at 19:57, Zhitao Li  wrote:
>>>
>>> Hello,
>>>
>>> We at Uber plan to use volume/secret isolator to send secrets from Uber
>>> framework to Mesos agent.
>>>
>>> For this purpose, we are referring to these documents:
>>>
>>>- File based secrets design doc
>>>
>>> <https://docs.google.com/document/d/18raiiUfxTh-JBvjd6RyHe_TOScY87G_bMi5zBzMZmpc/edit#>
>>>and slides
>>>
>>> <http://schd.ws/hosted_files/mesosconasia2017/70/Secrets%20Management%20in%20Mesos.pdf>
>>>.
>>>- Apache Mesos secrets documentation
>>><http://mesos.apache.org/documentation/latest/secrets/>
>>>
>>> Could you please confirm that the following assumptions are correct?
>>>
>>>- Mesos agent and master will never log the secret data at any
>>>logging level;
>>>- Mesos agent and master will never expose the secret data as part
>>>of any API response;
>>>- Mesos agent and master will never store the secret in any
>>>persistent storage, but only on tmpfs or ramfs;
>>>- When the secret is first downloaded on the mesos agent, it will be
>>>stored as "root" on the tmpfs/ramfs before being mounted in the container
>>>ramfs.
>>>
>>> If above assumptions are true, then I would like to see them documented
>>> in this as part of the Apache Mesos secrets documentation
>>> <http://mesos.apache.org/documentation/latest/secrets/>. Otherwise,
>>> we'd like to have a design discussion with maintainer of the isolator.
>>>
>>> We appreciate your help regarding this. Thanks!
>>>
>>> Regards,
>>> Aditya And Zhitao
>>>
>>>
>>>
>>
>>
>> --
>> Cheers,
>>
>> Zhitao Li
>>
>
>


-- 
Cheers,

Zhitao Li


Re: mesos-slave Failed to initialize: Failed to bind on 0.0.0.0:0: Address already in use: Address already in use [98]

2018-05-03 Thread Zhitao Li
5:::*
> LISTEN  19988/docker-proxy
> tcp6   0  0 :::31121:::*
> LISTEN  29037/docker-proxy
> udp0  0 0.0.0.0:24224   0.0.0.0:*
>28584/ruby
> udp0  0 192.168.0.1:123 0.0.0.0:*
>1348/ntpd
> udp0  0 59.110.24.56:1230.0.0.0:*
>1348/ntpd
> udp0  0 10.25.141.251:123   0.0.0.0:*
>1348/ntpd
> udp0  0 127.0.0.1:123   0.0.0.0:*
>1348/ntpd
> udp0  0 0.0.0.0:123 0.0.0.0:*
>1348/ntpd
> udp6   0  0 :::123  :::*
>   1348/ntpd
> Active UNIX domain sockets (only servers)
> Proto RefCnt Flags   Type   State I-Node   PID/Program
> namePath
> unix  2  [ ACC ] STREAM LISTENING 159071424/AliYunDun
> /tmp/Aegis-
> unix  2  [ ACC ] STREAM LISTENING 178254587 28237/python
>   /tmp/supervisor.sock.9
> unix  2  [ ACC ] STREAM LISTENING 178256956 19165/uwsgi
>  /ddserver/ddserver.sock
> unix  2  [ ACC ] STREAM LISTENING 229374127 17729/xin
>  /var/lib/alauda/xin/xin.sock
> unix  2  [ ACC ] STREAM LISTENING 15609
> 983/dbus-daemon /var/run/dbus/system_bus_socket
> unix  2  [ ACC ] STREAM LISTENING 8117 1/init
> @/com/ubuntu/upstart
> unix  2  [ ACC ] STREAM LISTENING 178252397 27819/dockerd
>  /var/run/docker.sock
> unix  2  [ ACC ] STREAM LISTENING 178252401
> 27855/docker-contai /var/run/docker/libcontainerd/docker-containerd.sock
> unix  2  [ ACC ] STREAM LISTENING 145571282/nscd
>  /var/run/nscd/socket
> unix  2  [ ACC ] STREAM LISTENING 178252469 27819/dockerd
>  /run/docker/libnetwork/2dc9ef136eb6933a5634dcf439d9fb
> d4acf96a5de16eb79122b6ddd8aefcf81c.sock
> unix  2  [ ACC ] STREAM LISTENING 178253543 28184/node
>   /root/.forever/sock/worker.1524559297488oBZ.sock
> unix  2  [ ACC ] STREAM LISTENING 159081424/AliYunDun
> /usr/local/aegis/Aegis-
> unix  2  [ ACC ] SEQPACKET  LISTENING 8173
>  416/systemd-udevd   /run/udev/control
>
> So where is the problem? Thanks.
>



-- 
Cheers,

Zhitao Li


Re: Questions about secret handling in Mesos

2018-04-23 Thread Zhitao Li
Hi Alexander,

We discovered that in our own testing thus do not plan to use the
environment variable. For the `volume/secret` case, I believe it's possible
to be careful enough so we do not log that, so it's more about whether we
want to promise that.

What do you think?

On Mon, Apr 23, 2018 at 5:13 AM, Alexander Rojas 
wrote:

>
> Hey Zhitao,
>
> I sadly have to tell you that the first assumption is not correct. If you
> use environment based secrets, docker and verbose mode, they will get
> printed (see this patch https://reviews.apache.org/r/57846/). The reason
> is that the docker command will get logged and it might contain your
> secrets. You may end up with some logging line like:
>
> ```
> I0129 14:09:22.444318 docker.cpp:1139] Running docker -H
> unix:///var/run/docker.suck run --cpu-shares 25 --memory 278435456 -e
> ADMIN_PASSWORD=test_password …
> ```
>
>
> On 19. Apr 2018, at 19:57, Zhitao Li  wrote:
>
> Hello,
>
> We at Uber plan to use volume/secret isolator to send secrets from Uber
> framework to Mesos agent.
>
> For this purpose, we are referring to these documents:
>
>- File based secrets design doc
>
> <https://docs.google.com/document/d/18raiiUfxTh-JBvjd6RyHe_TOScY87G_bMi5zBzMZmpc/edit#>
>and slides
>
> <http://schd.ws/hosted_files/mesosconasia2017/70/Secrets%20Management%20in%20Mesos.pdf>
>.
>- Apache Mesos secrets documentation
><http://mesos.apache.org/documentation/latest/secrets/>
>
> Could you please confirm that the following assumptions are correct?
>
>- Mesos agent and master will never log the secret data at any logging
>level;
>- Mesos agent and master will never expose the secret data as part of
>any API response;
>- Mesos agent and master will never store the secret in any persistent
>storage, but only on tmpfs or ramfs;
>- When the secret is first downloaded on the mesos agent, it will be
>stored as "root" on the tmpfs/ramfs before being mounted in the container
>ramfs.
>
> If above assumptions are true, then I would like to see them documented in
> this as part of the Apache Mesos secrets documentation
> <http://mesos.apache.org/documentation/latest/secrets/>. Otherwise, we'd
> like to have a design discussion with maintainer of the isolator.
>
> We appreciate your help regarding this. Thanks!
>
> Regards,
> Aditya And Zhitao
>
>
>


-- 
Cheers,

Zhitao Li


Questions about secret handling in Mesos

2018-04-19 Thread Zhitao Li
Hello,

We at Uber plan to use volume/secret isolator to send secrets from Uber
framework to Mesos agent.

For this purpose, we are referring to these documents:

   - File based secrets design doc
   

   and slides
   

   .
   - Apache Mesos secrets documentation
   

Could you please confirm that the following assumptions are correct?

   - Mesos agent and master will never log the secret data at any logging
   level;
   - Mesos agent and master will never expose the secret data as part of
   any API response;
   - Mesos agent and master will never store the secret in any persistent
   storage, but only on tmpfs or ramfs;
   - When the secret is first downloaded on the mesos agent, it will be
   stored as "root" on the tmpfs/ramfs before being mounted in the container
   ramfs.

If above assumptions are true, then I would like to see them documented in
this as part of the Apache Mesos secrets documentation
. Otherwise, we'd
like to have a design discussion with maintainer of the isolator.

We appreciate your help regarding this. Thanks!

Regards,
Aditya And Zhitao


Re: Reason of cascaded kill in a group

2018-04-10 Thread Zhitao Li
Hi Benjamin,

Yes that's what I meant: adding a new reason for such cascaded kill.

On Tue, Apr 10, 2018 at 1:17 PM, Benjamin Mahler  wrote:

> Are you saying that there was no reason previously, and there would be a
> reason after the change? If so, adding a reason where one did not exist is
> safe from a backwards compatibility perspective.
>
> On Mon, Apr 9, 2018 at 10:32 AM, Zhitao Li  wrote:
>
>> Hi,
>>
>> We are considering adding a new reason to StatusUpdate::Reason
>> <https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L2395>,
>> to reflect the case when a task in a task group is killed cascaded:
>>
>> Currently, if a task fails in a task group, other active tasks in the
>> same group will see *TASK_KILLED* without any custom reason. We would
>> like to provide a custom reason like *REASON_TASK_GROUP_KILLED* to
>> distinguish whether the task is killed upon request of scheduler or upon a
>> cascaded failure.
>>
>>
>> Question to framework maintainer: does any framework depends the value
>> of this reason? If not, we probably can just change the reason without a
>> opt-in mechanism from framework (i.e, a new framework capability).
>>
>> Please let me know if your framework as such a dependency.
>>
>> Thanks!
>>
>>
>> --
>> Cheers,
>>
>> Zhitao Li
>>
>
>


-- 
Cheers,

Zhitao Li


Reason of cascaded kill in a group

2018-04-09 Thread Zhitao Li
Hi,

We are considering adding a new reason to StatusUpdate::Reason
<https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L2395>,
to reflect the case when a task in a task group is killed cascaded:

Currently, if a task fails in a task group, other active tasks in the same
group will see *TASK_KILLED* without any custom reason. We would like to
provide a custom reason like *REASON_TASK_GROUP_KILLED* to distinguish
whether the task is killed upon request of scheduler or upon a cascaded
failure.


Question to framework maintainer: does any framework depends the value of
this reason? If not, we probably can just change the reason without a
opt-in mechanism from framework (i.e, a new framework capability).

Please let me know if your framework as such a dependency.

Thanks!


-- 
Cheers,

Zhitao Li


Re: Support deadline for tasks

2018-03-23 Thread Zhitao Li
Thanks James. I'll update the JIRA with our names and start with some
prototype.

On Thu, Mar 22, 2018 at 9:07 PM, James Peach  wrote:

>
>
> > On Mar 22, 2018, at 10:06 AM, Zhitao Li  wrote:
> >
> > In our environment, we run a lot of batch jobs, some of which have tight
> timeline. If any tasks in the job runs longer than x hours, it does not
> make sense to run it anymore.
> >
> > For instance, a team would submit a job which builds a weekly index and
> repeats every Monday. If the job does not finish before next Monday for
> whatever reason, there is no point to keep any task running.
> >
> > We believe that implementing deadline tracking distributed across our
> cluster makes more sense as it makes the system more scalable and also
> makes our centralized state machine simpler.
> >
> > One idea I have right now is to add an  optional TimeInfo deadline to
> TaskInfo field, and all default executors in Mesos can simply terminate the
> task and send a proper StatusUpdate.
> >
> > I summarized above idea in MESOS-8725.
> >
> > Please let me know what you think. Thanks!
>
> This sounds both useful and simple to implement. I’m happy to shepherd if
> you’d like
>
> J




-- 
Cheers,

Zhitao Li


Support deadline for tasks

2018-03-22 Thread Zhitao Li
In our environment, we run a lot of batch jobs, some of which have tight
timeline. If any tasks in the job runs longer than x hours, it does not
make sense to run it anymore.

For instance, a team would submit a job which builds a weekly index and
repeats every Monday. If the job does not finish before next Monday for
whatever reason, there is no point to keep any task running.

We believe that implementing deadline tracking distributed across our
cluster makes more sense as it makes the system more scalable and also
makes our centralized state machine simpler.

One idea I have right now is to add an  *optional* *TimeInfo deadline* to
TaskInfo field, and all default executors in Mesos can simply terminate the
task and send a proper *StatusUpdate.*

I summarized above idea in MESOS-8725
<https://issues.apache.org/jira/browse/MESOS-8725>.

Please let me know what you think. Thanks!

-- 
Cheers,

Zhitao Li


Re: Release policy and 1.6 release schedule

2018-03-14 Thread Zhitao Li
An additional data point is how long it takes from first RC being cut to
the final release tag vote passes. That probably indicates smoothness of
the release process and how good the quality control measures.

I would argue for not delaying release for new features and align with the
schedule we declared on policy. That makes upstream projects easier to
gauge when a feature will be ready and when they can try it out.

On Tue, Mar 13, 2018 at 3:10 PM, Greg Mann  wrote:

> Hi folks,
> During the recent API working group meeting [1], we discussed the release
> schedule. This has been a recurring topic of discussion in the developer
> sync meetings, and while our official policy still specifies time-based
> releases at a bi-monthly cadence, in practice we tend to gate our releases
> on the completion of certain features, and our releases go out on a
> less-frequent basis. Here are the dates of our last few release blog posts,
> which I'm assuming correlate pretty well with the actual release dates:
>
> 1.5.0: 2/8/18
> 1.4.0: 9/18/17
> 1.3.0: 6/7/17
> 1.2.0: 3/8/17
> 1.1.0: 11/10/16
>
> Our current cadence seems to be around 3-4 months between releases, while
> our documentation states that we release every two months [2]. My primary
> motivation here is to bring our documented policy in line with our
> practice, whatever that may be. Do people think that we should attempt to
> bring our release cadence more in line with our current stated policy, or
> should the policy be changed to reflect our current practice?
>
> If we were to attempt to align with our stated policy for 1.6.0, then we
> would release around April 8, which would probably mean cutting an RC
> sometime around the end of March or beginning of April. This is very soon!
> :)
>

> I'm currently working with Gastón on offer operation feedback, and I'm not
> sure that we would have it ready in time for an early April release date.
> Personally, I would be OK with this, since we could land the feature in
> 1.7.0 in June. However, I'm not sure how well this schedule would work for
> the features that other people are currently working on.
>

A highly important feature our org need is resizing of persistent volume. I
think it has a good chance to make the stated 1.6 schedule.


>
> I'm curious to hear people's thoughts on this, developers and users alike!
>
> Cheers,
> Greg
>
>
> [1] https://docs.google.com/document/d/1JrF7pA6gcBZ6iyeP5YgD
> G62ifn0cZIBWw1f_Ler6fLM/edit#
> [2] http://mesos.apache.org/documentation/latest/versioning/
> #release-schedule
>



-- 
Cheers,

Zhitao Li


Re: Welcome Chun-Hung Hsiao as Mesos Committer and PMC Member

2018-03-12 Thread Zhitao Li
Congrats, Chun!

On Sun, Mar 11, 2018 at 11:47 PM, Gilbert Song 
wrote:

> Congrats, Chun!
>
> It is great to have you in the community!
>
> - Gilbert
>
> On Sun, Mar 11, 2018 at 4:40 PM, Andrew Schwartzmeyer <
> and...@schwartzmeyer.com> wrote:
>
> > Congratulations Chun!
> >
> > I apologize for not also giving you a +1, as I certainly would have, but
> > just discovered my mailing list isn't working. Just a heads up, don't let
> > that happen to you too!
> >
> > I look forward to continuing to work with you.
> >
> > Cheers,
> >
> > Andy
> >
> >
> > On 03/10/2018 9:14 pm, Jie Yu wrote:
> >
> >> Hi,
> >>
> >> I am happy to announce that the PMC has voted Chun-Hung Hsiao as a new
> >> committer and member of PMC for the Apache Mesos project. Please join me
> >> to
> >> congratulate him!
> >>
> >> Chun has been an active contributor for the past year. His main
> >> contributions to the project include:
> >> * Designed and implemented gRPC client support to libprocess
> (MESOS-7749)
> >> * Designed and implemented Storage Local Resource Provider (MESOS-7235,
> >> MESOS-8374)
> >> * Implemented part of the CSI support (MESOS-7235, MESOS-8374)
> >>
> >> Chun is friendly and humble, but also intelligent, insightful, and
> >> opinionated. I am confident that he will be a great addition to our
> >> committer pool. Thanks Chun for all your contributions to the project so
> >> far!
> >>
> >> His committer checklist can be found here:
> >> https://docs.google.com/document/d/1FjroAvjGa5NdP29zM7-2eg6t
> >> LPAzQRMUmCorytdEI_U/edit?usp=sharing
> >>
> >> - Jie
> >>
> >
> >
>



-- 
Cheers,

Zhitao Li


Re: Get leading master from zookeeper

2018-03-06 Thread Zhitao Li
Mesos implemented the official ZK election recipe
<https://zookeeper.apache.org/doc/current/recipes.html#sc_leaderElection>,
so you should expect a list of ephemeral ZNones under your election path
(often `mesos` but up to your config).

The one with lowest suffix sequence number has the content of leader
address (in JSON format)

Hope this is useful.

On Mon, Mar 5, 2018 at 9:54 PM, Ajay V  wrote:

> Hello,
>
> Is there a write up somewhere that I may have missed finding that talks
> about how to find the leading master given the zookeeper url? I can find
> that there are json.info_.. MasterInfo jsons available in zk, but how can I
> translate this to figure out the leading master at any given point in time?
>
> Regards,
> Ajay
>



-- 
Cheers,

Zhitao Li


Re: [VOTE] Release Apache Mesos 1.5.0 (rc2)

2018-02-03 Thread Zhitao Li
+1 (non-binding)

Tested with running all tests on Debian/jessie server on AWS.

On Fri, Feb 2, 2018 at 3:25 PM, Jie Yu  wrote:

> +1
>
> Verified in our internal CI that `sudo make check` passed in CentOS 6,
> CentOS7, Debian 8, Ubuntu 14.04, Ubuntu 16.04 (both w/ or w/o SSL enabled).
>
>
> On Thu, Feb 1, 2018 at 5:36 PM, Gilbert Song  wrote:
>
> > Hi all,
> >
> > Please vote on releasing the following candidate as Apache Mesos 1.5.0.
> >
> > 1.5.0 includes the following:
> > 
> > 
> >   * Support Container Storage Interface (CSI).
> >   * Agent reconfiguration policy.
> >   * Auto GC docker images in Mesos Containerizer.
> >   * Standalone containers.
> >   * Support gRPC client.
> >   * Non-leading VOTING replica catch-up.
> >
> >
> > The CHANGELOG for the release is available at:
> > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_p
> > lain;f=CHANGELOG;hb=1.5.0-rc2
> > 
> > 
> >
> > The candidate for Mesos 1.5.0 release is available at:
> > https://dist.apache.org/repos/dist/dev/mesos/1.5.0-rc2/
> mesos-1.5.0.tar.gz
> >
> > The tag to be voted on is 1.5.0-rc2:
> > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.5.0-rc2
> >
> > The MD5 checksum of the tarball can be found at:
> > https://dist.apache.org/repos/dist/dev/mesos/1.5.0-rc2/mesos
> > -1.5.0.tar.gz.md5
> >
> > The signature of the tarball can be found at:
> > https://dist.apache.org/repos/dist/dev/mesos/1.5.0-rc2/mesos
> > -1.5.0.tar.gz.asc
> >
> > The PGP key used to sign the release is here:
> > https://dist.apache.org/repos/dist/release/mesos/KEYS
> >
> > The JAR is in a staging repository here:
> > https://repository.apache.org/content/repositories/orgapachemesos-1222
> >
> > Please vote on releasing this package as Apache Mesos 1.5.0!
> >
> > The vote is open until Tue Feb  6 17:35:16 PST 2018 and passes if a
> > majority of at least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Mesos 1.5.0
> > [ ] -1 Do not release this package because ...
> >
> > Thanks,
> > Jie and Gilbert
> >
>



-- 
Cheers,

Zhitao Li


Re: Questions about Pods and the Mesos Containerizer

2018-01-24 Thread Zhitao Li
Glad someone is also looking this.

On Wed, Jan 24, 2018 at 2:43 PM, Jie Yu  wrote:

> I can help answer some of them:
>
> Is it possible to do healthchecks per task in a pod?
>
> I believe so given healthcheck is at the TaskInfo level, but AlexR can
> confirm.
>
>  Is it possible to allocate a separate IP address per container in a pod?
>
>  Not right now, but possible. We need to change the CNI network isolator
> to support that, but there might be caveats on the road.
>
> Is there any plan to support the Docker containeriser with pods?
>
> Probably not. If I want to do that, I'd prefer we refactor Docker
> containerizer to use containerd first, and then support pod there.
>
>  Timeframe for debugging tools (equivalent of docker exec, etc)?
>
> We'll have a containerization WG meeting tomorrow morning. I'll make sure
> this is on the list. Not timeframe yet, but this shouldn't take too long.
>
> Is there any performance data about using the Mesos containeriser with
>> container images versus using the Docker containeriser?
>> how does the Mesos containerizer handle extremely large images?
>>
>
We don't have systematic benchmark yet but we plan to do something in Q1,
after which we'll share some result. In an one-off case, we observed that
large image provisioning was ~2x faster on Mesos containerizer comparing to
docker daemon. The engineer believed it's because Linux tar was faster than
tar utilities from Golang's standard library. This is not repeatedly
verified.


> how does the Mesos containerizer handle dozens/hundreds of concurrent
>> pulls?
>
>
Same, we plan to do something and share our result once we have them.


>
> I believe Uber folks might have some data on this (cc Zhitao)?
>
> - Jie
>
> On Wed, Jan 24, 2018 at 2:21 PM, David Morrison  wrote:
>
>> Hi Mesos community!
>>
>> We’re in the process of designing a Mesos framework to launch multiple
>> containers together on the same host and are considering a couple of
>> approaches. The first is to use pods (with the TASK_GROUP primitive), and
>> the second is write a custom executor that launches nested containers and
>> use CNI to handle networking.
>>
>> With that in mind, we had the following questions:
>>
>> Questions about pods/task_groups:
>>
>>-
>>
>>Is it possible to do healthchecks per task in a pod?
>>-
>>
>>Is it possible to allocate a separate IP address per container in a
>>pod?
>>-
>>
>>Is there any plan to support the Docker containeriser with pods?
>>
>>
>> Questions about UCR/Mesos containerizer:
>>
>>-
>>
>>Timeframe for debugging tools (equivalent of docker exec, etc)?
>>-
>>
>>Is there any performance data about using the Mesos containeriser
>>with container images versus using the Docker containeriser?
>>-
>>
>>   how does the Mesos containerizer handle extremely large images?
>>   -
>>
>>   how does the Mesos containerizer handle dozens/hundreds of
>>   concurrent pulls?
>>
>>
>> If anyone has had any experience using the UCR and/or pods with the sort
>> of workflow we’re considering, your input would be highly useful!
>>
>> Cheers,
>>
>> David Morrison
>>
>> Software Engineer @ Yelp
>>
>>
>


-- 
Cheers,

Zhitao Li


Duplicate task ID for same framework on different agents

2017-12-20 Thread Zhitao Li
Hi all,

We have seen a mesos master crash loop after a leader failover. After more
investigation, it seems that a same task ID was managed to be created onto
multiple Mesos agents in the cluster.

One possible logical sequence which can lead to such problem:

1. Task T1 was launched to master M1 on agent A1 for framework F;
2. Master M1 failed over to M2;
3. Before A1 reregistered to M2, the same T1 was launched on to agent A2:
M2 does not know previous T1 yet so it accepted it and sent to A2;
4. A1 reregistered: this probably crashed M2 (because same task cannot be
added twice);
5. When M3 tries to come up after M2, it further crashes because both A1
and A2 tried to add a T1 to the framework.

(I only have logs to prove the last step right now)

This happened on 1.4.0 masters.

Although this is probably triggered by incorrect retry logic on framework
side, I wonder whether Mesos master should do extra protection to prevent
such issue to cause master crash loop. Some possible ideas are to instruct
one of the agents carrying tasks w/ duplicate ID to terminate corresponding
tasks, or just refuse to reregister such agents and instruct them to
shutdown.

I also filed MESOS-8353 <https://issues.apache.org/jira/browse/MESOS-8353>
to track this potential bug. Thanks!


-- 

Cheers,

Zhitao Li


Re: Welcome Andrew Schwartzmeyer as a new committer and PMC member!

2017-11-27 Thread Zhitao Li
Congratulations!

On Mon, Nov 27, 2017 at 4:18 PM, Aaron Wood  wrote:

> Great work Andrew!
>
> On Nov 27, 2017 6:00 PM, "Joseph Wu"  wrote:
>
> Hi devs & users,
>
> I'm happy to announce that Andrew Schwartzmeyer has become a new committer
> and member of the PMC for the Apache Mesos project.  Please join me in
> congratulating him!
>
> Andrew has been an active contributor to Mesos for about a year.  He has
> been the primary contributor behind our efforts to change our default build
> system to CMake and to port Mesos onto Windows.
>
> Here is his committer candidate checklist for your perusal:
> https://docs.google.com/document/d/1MfJRYbxxoX2-A-
> g8NEeryUdUi7FvIoNcdUbDbGguH1c/
>
> Congrats Andy!
> ~Joseph
>



-- 
Cheers,

Zhitao Li


Re: Subscribe to an active framework through HTTP API Scheduler

2017-10-26 Thread Zhitao Li
Each active framework on HTTP scheduler API is allocated with a stream id.
This is included in the header "Mesos-Stream-Id" in initial subscribed
response.

If you obtain this stream id, another process can "impersonate" this
framework to submit kill requests (using framework id, task id and the same
credential if auth is enabled).

I cannot find a good place in Mesos master which logs this stream id, so
probably report it in your framework side?

Hope this is helpful as we have done similar things recently.



On Thu, Oct 26, 2017 at 2:10 AM, Manuel Montesino <
manuel.montes...@piksel.com> wrote:

> Hi,
>
>
> We have a framework with some tasks that we would like to kill but not all
> framework (teardown), so we would like to use the kill method of the http
> api scheduler, the problem is that is needed to be suscribed. Creating a
> new framework in stream mode and executing the kill method it's not allowed
> to kill task for another framework (403 Forbidden).
>
>
> So, is possible to subscribe/connect to an existing framework, or other
> mode to operate into it from the http api?.
>
>
> Thanks in advance.
>
>
> *Manuel Montesino*
> Devops Engineer
>
> *E* *manuel.montesino@piksel(dot)com*
>
> Marie Curie,1. Ground Floor. Campanillas, Malaga 29590
> *liberating viewing* | *piksel.com <http://piksel.com>*
>
> [image: Piksel_Email.png]
>
> This message is private and confidential. If you have received this
> message in error, please notify the sender or serviced...@piksel.com and
> remove it from your system.
>
> Piksel Inc is a company registered in the United States, 2100 Powers
> Ferry Road SE, Suite 400, Atlanta, GA 30339
> <https://maps.google.com/?q=2100+Powers+Ferry+Road+SE,+Suite+400,+Atlanta,+GA+30339&entry=gmail&source=g>
>



-- 
Cheers,

Zhitao Li


Re: Updating running tasks in-place

2017-10-04 Thread Zhitao Li
Thanks for taking the lead, Yan! Replying to your points inline:

On Wed, Oct 4, 2017 at 11:11 AM, Yan Xu  wrote:

> Hi Mesos users/devs,
>
> I am curious about what use cases do folks in the community have about
> updating running tasks? i.e., amending the current task without going
> through the typical kill -> offer -> relaunch process.
>
> Typically you would only want to do that for the "pets
> <https://www.theregister.co.uk/2013/03/18/servers_pets_or_cattle_cern/>"
> in
> your cluster as it adds complexity in managing the tasks' lifecycle but
> nevertheless in some cases it is too expensive to relocate the app or even
> relaunching it onto the same host later.
>
> https://issues.apache.org/jira/browse/MESOS-1280 has some context about
> this. In particular, people have mentioned the desire to:
>
>- Dynamically reconfiguring the task without restarting it.
>- Upgrading the task transparently (i.e., restarting without dropping
>connections)
>

One possible use case we have on this is to upgrade service mesh components
(consider something similar to haproxy): because these instances handles
all connections on the machine, restarting without dropping connection is a
must for them.


>- Replacing tasks with another without going through offer cycles
>

We have concrete use case for this one.


>- Task resizing <https://issues.apache.org/jira/browse/MESOS-1279>
> (which
>is captured in another JIRA)

   - Certain metadata, e.g., labels (but I imagine not all metadata makes
>equal sense to be updatable).
>
> What other/specific use cases are folks interested in?
>
> Best,
> Yan
>



-- 
Cheers,

Zhitao Li


Re: Welcome James Peach as a new committer and PMC memeber!

2017-09-07 Thread Zhitao Li
Congratulations James! Very well deserved! Looking forward for more great
work!

On Thu, Sep 7, 2017 at 6:19 AM, Klaus Ma  wrote:

> Congrats !!
>
> 
> Da (Klaus), Ma (马达) | PMP® | R&D of IBM Cloud private
> IBM Spectrum Computing, IBM System
> +86-10-8245 4084 <+86%2010%208245%204084> | mad...@cn.ibm.com | @k82cn
> <http://github.com/k82cn>
>
> On Thu, Sep 7, 2017 at 3:08 PM, tommy xiao  wrote:
>
>> Congrats James! Well deserved!
>>
>> 2017-09-07 14:54 GMT+08:00 Ben Lin :
>>
>>> Congrats!!
>>>
>>> --
>>> *From:* Oucema Bellagha 
>>> *Sent:* Thursday, September 7, 2017 2:51:44 PM
>>> *To:* user@mesos.apache.org
>>> *Subject:* Re: Welcome James Peach as a new committer and PMC memeber!
>>>
>>> Congrats my friend !
>>>
>>> --
>>> *From:* xuj...@apple.com  on behalf of Yan Xu <
>>> xuj...@apple.com>
>>> *Sent:* Wednesday, September 6, 2017 9:08:42 PM
>>> *To:* dev; user
>>> *Subject:* Welcome James Peach as a new committer and PMC memeber!
>>>
>>> Hi Mesos devs and users,
>>>
>>> Please welcome James Peach as a new Apache Mesos committer and PMC
>>> member.
>>>
>>> James has been an active contributor to Mesos for over two years now. He
>>> has made many great contributions to the project which include XFS disk
>>> isolator, improvement to Linux capabilities support and IPC namespace
>>> isolator. He's super active on the mailing lists and slack channels, always
>>> eager to help folks in the community and he has been helping with a lot of
>>> Mesos reviews as well.
>>>
>>> Here is his formal committer candidate checklist:
>>>
>>> https://docs.google.com/document/d/19G5zSxhrRBdS6GXn9KjCznjX
>>> 3cp0mUbck6Jy1Hgn3RY/edit?usp=sharing
>>> <https://docs.google.com/document/d/19G5zSxhrRBdS6GXn9KjCznjX3cp0mUbck6Jy1Hgn3RY/edit?usp=sharing>
>>>
>>> Congrats James!
>>>
>>> Yan
>>>
>>>
>>
>>
>> --
>> Deshi Xiao
>> Twitter: xds2000
>> E-mail: xiaods(AT)gmail.com
>>
>
>


-- 
Cheers,

Zhitao Li


Re: [VOTE] Release Apache Mesos 1.4.0 (rc1)

2017-08-21 Thread Zhitao Li
+1 (nonbinding)

Tested by running `make check` on a debian/jessie server on AWS.

On Fri, Aug 18, 2017 at 12:27 PM, Kapil Arya  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.4.0.
>
> 1.4.0 includes the following:
> 
>
>   * Ability to recover the agent ID after a host reboot.
>   * File-based and image-pull secrets.
>   * Linux ambient and bounding capabilities support.
>   * Hierarchical resource allocation roles. [EXPERIMENTAL]
>
> The CHANGELOG for the release is available at:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_p
> lain;f=CHANGELOG;hb=1.4.0-rc1
> 
>
>
> The candidate for Mesos 1.4.0 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/1.4.0-rc1/mesos-1.4.0.tar.gz
>
> The tag to be voted on is 1.4.0-rc1:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.4.0-rc1
>
> The MD5 checksum of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.4.0-rc1/mesos
> -1.4.0.tar.gz.md5
>
> The signature of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.4.0-rc1/mesos
> -1.4.0.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is up in Maven in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1206
>
> Please vote on releasing this package as Apache Mesos 1.4.0!
>
> The vote is open until Wed. Aug 23, 2017 11:59:59 PM PDT, and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 1.4.0
> [ ] -1 Do not release this package because ...
>
> Thanks,
> Anand and Kapil
>
>


-- 
Cheers,

Zhitao Li


Re: Welcome Greg Mann as a new committer and PMC member!

2017-06-16 Thread Zhitao Li
Congratulations, Greg! It's such a pleasure to work with you, and special
thanks for all the effort on the security aspect of Mesos.

On Thu, Jun 15, 2017 at 4:12 PM, Benjamin Mahler  wrote:

> Thanks for all that you've done for the project so far Greg, it's been a
> pleasure working with you.
>
> Congrats and welcome!
>
> On Tue, Jun 13, 2017 at 2:42 PM, Vinod Kone  wrote:
>
> > Hi folks,
> >
> > Please welcome Greg Mann as the newest committer and PMC member of the
> > Apache Mesos project.
> >
> > Greg has been an active contributor to the Mesos project for close to 2
> > years now and has made many solid contributions. His biggest source code
> > contribution to the project has been around adding authentication support
> > for default executor. This was a major new feature that involved quite a
> > few moving parts. Additionally, he also worked on improving the scheduler
> > and executor APIs.
> >
> > Here is his more formal checklist for your perusal.
> >
> > https://docs.google.com/document/d/1S6U5OFVrl7ySmpJsfD4fJ3_
> > R8JYRRc5spV0yKrpsGBw/edit
> >
> > Thanks,
> > Vinod
> >
> >
>



-- 
Cheers,

Zhitao Li


Re: Welcome Gilbert Song as a new committer and PMC member!

2017-05-24 Thread Zhitao Li
Congrats Gilbert!

On Wed, May 24, 2017 at 11:08 AM, Yan Xu  wrote:

> Congrats! Well deserved!
>
> ---
> Jiang Yan Xu  | @xujyan <https://twitter.com/xujyan>
>
> On Wed, May 24, 2017 at 10:54 AM, Vinod Kone  wrote:
>
>> Congrats Gilbert!
>>
>> On Wed, May 24, 2017 at 1:32 PM, Neil Conway 
>> wrote:
>>
>> > Congratulations Gilbert! Well-deserved!
>> >
>> > Neil
>> >
>> > On Wed, May 24, 2017 at 10:32 AM, Jie Yu  wrote:
>> > > Hi folks,
>> > >
>> > > I' happy to announce that the PMC has voted Gilbert Song as a new
>> > committer
>> > > and member of PMC for the Apache Mesos project. Please join me to
>> > > congratulate him!
>> > >
>> > > Gilbert has been working on Mesos project for 1.5 years now. His main
>> > > contribution is his work on unified containerizer, nested container
>> (aka
>> > > Pod) support. He also helped a lot of folks in the community regarding
>> > their
>> > > patches, questions and etc. He also played an important role
>> organizing
>> > > MesosCon Asia last year and this year!
>> > >
>> > > His formal committer checklist can be found here:
>> > > https://docs.google.com/document/d/1iSiqmtdX_0CU-YgpViA6r6PU_
>> > aMCVuxuNUZ458FR7Qw/edit?usp=sharing
>> > >
>> > > Welcome, Gilbert!
>> > >
>> > > - Jie
>> >
>>
>
>


-- 
Cheers,

Zhitao Li


Plan for upgrading protobuf==3.2.0 in Mesos

2017-04-25 Thread Zhitao Li
Dear framework owners and users,

We are working on upgrading the protobuf library in Mesos to 3.2.0 in
https://issues.apache.org/jira/browse/MESOS-7228, to overcome some protobuf
limitation on message size as well as preparing for further improvement. We
aim to release this with the upcoming Mesos 1.3.0.

Because we upgraded the protoc compiler in this process, all generated java
and python code may not be compatible with protobuf 2.6.1 (the previous
dependency), and we ask you to upgrade the protobuf dependency to 3.2.0
when you upgrade your framework dependency to 1.3.0.

For java, a snapshot maven artifact has been prepared (by Anand Mazumdar's
courtesy) at
https://repository.apache.org/content/repositories/snapshots/org/apache/mesos/mesos/1.3.0-SNAPSHOT/
. Please feel free to play out with it and let us know if you run into any
issues.

Note that the binary upgrade process should still be compatible: any java
or based framework (scheduler or executor) should still work out of box
with Mesos 1.3.0 once released. It is suggested to get your cluster
upgraded to 1.3.0 first, then come back and upgrade your executors and
schedulers.

We understand this may expose inconvenience around updating the protobuf
dependency, so please let us know if you have any concern or further
questions.

-- 

Cheers,

Zhitao Li and Anand Mazumdar,


Re: Providing end-user feedback on Docker image download progress

2017-01-10 Thread Zhitao Li
Hi Franck,

Can you enable comment on the doc? Thanks!

On Tue, Jan 10, 2017 at 6:52 AM, Frank Scholten 
wrote:

> Here is a very rudimentary design doc on image download progress:
> https://docs.google.com/document/d/1x9dtcNgwecAp1xAHeDH-
> FJOJRIWC3OfC2akX8LLWNcs/edit?usp=sharing
>
> Feel free to comment.
>
> Cheers,
>
> Frank
>
>
>
> On Tue, Jan 10, 2017 at 8:49 AM, Frank Scholten 
> wrote:
> > Hi Jie,
> >
> > Great!
> >
> > What are the next steps here? Create a design document?
> >
> > Cheers,
> >
> > Frank
> >
> >
> > On Mon, Jan 9, 2017 at 8:38 PM, Jie Yu  wrote:
> >> Frank,
> >>
> >> Thanks for reaching out! I think this is definitely something we've
> thought
> >> about, just don't have the cycle to get it prioritized.
> >> https://issues.apache.org/jira/browse/MESOS-2256
> >>
> >> The idea is around re-using STAGING state with more information about
> the
> >> progress of the provisioning (and fetching). cc Vinod, BenM, BenH
> >>
> >>> We want to contribute this work back to the project and like to know
> >>>
> >>> which of the above and other options are the most viable.
> >>
> >>
> >> That's great!
> >>
> >> - Jie
> >>
> >> On Mon, Jan 9, 2017 at 3:07 AM, Frank Scholten 
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> Together with a client we are looking into ways to provide end-user
> >>> feedback on Docker image downloads for Mesos tasks.
> >>>
> >>> The idea is that an end user who uses a cli tool that connects to a
> >>> Mesos scheduler gets image dowload and provisioning progress on it's
> >>> standard out stream every few seconds.
> >>>
> >>> Our client has a Mesos framework that runs large Docker images via the
> >>> Mesos Containerizer and we like to know what our options are of adding
> >>> this feature either to the framework, to a Mesos module, or to Mesos
> >>> itself using some sort of API or a new task state next to STAGING like
> >>> DOWNLOADING including detailed progress information.
> >>>
> >>> We want to contribute this work back to the project and like to know
> >>> which of the above and other options are the most viable.
> >>>
> >>> Cheers,
> >>>
> >>> Frank
> >>
> >>
>



-- 
Cheers,

Zhitao Li


Proposal for evaluating Mesos scalability and robustness through stress test.

2017-01-06 Thread Zhitao Li
(sending this again since previous attempt seemed bumped back)

Hi folks,

As all of you we are super excited to use Mesos to manage thousands of
different applications on  our large-scale clusters. When the application
and host amount keeps increasing, we are getting more and more curious
about what would be the potential scalability limit/bottleneck to Mesos'
centralized architecture and what is its robustness in the face of various
failures. If we can identify them in advance, probably we can manage and
optimize them before we are suffering in any potential performance
degradations.

To explore Mesos' capability and break the knowledge gap, we have a
proposal to evaluate Mesos scalability and robustness through stress test,
the draft of which can be found at: draft_link
<https://docs.google.com/document/d/10kRtX4II74jfUuHJnX2F5teqpXzHYFQAZGWjCdS3cZA/edit?usp=sharing>.
Please
feel free to provide your suggestions and feedback through comment on the
draft.

Probably many of you have similar questions as we have. We will be happy to
share our findings in these experiments with the Mesos community. Please
stay tuned.

-- 
Cheers,

Ao Ma & Zhitao Li


Re: Optimize libprocess performance

2017-01-04 Thread Zhitao Li
Strongly +1 for having some initial benchmark as base before optimizations
are implemented.

On Wed, Jan 4, 2017 at 5:26 PM, Benjamin Mahler  wrote:

> Which areas does the performance not meet your needs? There are a lot of
> aspects to libprocess that can be optimized, so it would be good to focus
> on each of your particular use cases via benchmarks, this allows us to have
> a shared way to profile and measure improvements.
>
> Copy elimination is one area where a lot of improvement can be made across
> libprocess, note that libprocess was implemented before we had C++11 move
> support available. We've recently made some improvements to update the HTTP
> serving path towards zero-copies but it's not completely done. Can you
> submit patches for the ProcessBase::send() path copy elimination? We can
> have a move overload for ProcessBase::send and have ProtobufProcess::send()
> and encode() perform moves instead of a copy.
>
> With respect to the MessageEncoder, since it's less trivial, you can submit
> a benchmark that captures the use case you care about and we can drive
> improvements using it. I have some suggestions here as well but we can
> discuss once we have the benchmarks committed.
>
> How does that sound to start?
>
> On Tue, Jan 3, 2017 at 7:31 PM, pangbingqiang 
> wrote:
>
> > Hi All:
> >
> >   We use libprocess as our underlying communication library, but we find
> > it’s performance don’t meet, we want to optimize it, for example:
> >
> > *  ‘send’ function *implementation one metadata has four times memory
> > copy,
> >
> > *1. ProtobufMessage SerializeToString then processbase ‘encode’ construct
> > string once;*
> >
> > *2. In ‘encode’ function Message body copy again;*
> >
> > *3. In MessageEncoder in order to construct HTTP Request, copy again;*
> >
> > *4.   **MessageEncoder return copy again;*
> >
> >   How to optimize this scenario may be useful.
> >
> >   Also , in libprocess it has so many lock:
> >
> > *1.   **SocketManager:   std::recursive_mutex mutex;*
> >
> > *2.   **ProcessManager:  std::recursive_mutex processes_mutex;*
> *std::recursive_mutex
> > runq_mutex; std::recursive_mutex firewall_mutex;*
> >
> > In particular, everytime event enqueue/dequeue both need to get lock,
> > maybe use lookfree struct is better.
> >
> >
> >
> > If have any optimize suggestion or discussion, please let me know,
> thanks.
> >
> >
> >
> > [image: cid:image001.png@01D0E8C5.8D08F440]
> >
> >
> >
> > Bingqiang Pang(庞兵强)
> >
> >
> >
> > Distributed and Parallel Software Lab
> >
> > Huawei Technologies Co., Ltd.
> >
> > Email:pangbingqi...@huawei.com 
> >
> >
> >
> >
> >
>



-- 
Cheers,

Zhitao Li


Re: Structured logging for Mesos (or c++ glog)

2016-12-20 Thread Zhitao Li
Hi Otis,

Thanks for the good summary. The conversation is mostly about 1) in this
thread, because right now Mesos logs are not really structured, or at least
most of it.

On Tue, Dec 20, 2016 at 6:57 AM, Otis Gospodnetić <
otis.gospodne...@gmail.com> wrote:

> Hi Zhitao,
>
> When people talk about structure and logging it typically means two things:
>
> 1) make the log format a known/standard format where all its elements are
> known, and thus it's easy to parse them; a log event can still be a single
> line, but it can also be multi-line or JSON or some other (even binary)
> format.  As long as the format/structure is known, the log event *is*
> structured.
>
> 2) I want tools/configs/patterns that will let me easily parse this log
> event structure and send it somewhere (e.g. Elasticsearch or Logsene
> <http://sematext.com/logsene> or ...) where this structure will be
> handled in the way that lets me easy filtering/slicing and dicing by one or
> more attributes/fields extracted from the log event structure.
>
> *For 1*):
> I'm assuming Mesos logs already are structured.  I assume their format is
> either widely known (like Apache common log format, for example), or
> well-documented (again like Apache common log format).  If that is not
> true, then yes, Mesos devs will want to do document the structure.  I've
> looked at https://mesos.apache.org/documentation/latest/logging/ but saw
> nothing mentioning the structure.  Maybe this info is somewhere else?
>
> *For 2)*
> This is where modern log shippers come in. We open-sourced our Logagent
> <https://github.com/sematext/logagent-js> (more info here
> <http://sematext.com/logagent/>), which has log parsing (and thus
> structuring) built-in.  It ships with a bunch of log patterns/parsers, and
> one can add new ones (e.g. for Mesos).  Elasticsearch, mentioned in this
> thread, is one of the outputs.  It's sort of like Filebeat+Logstash in one,
> and it's often used in Dockerized deployments, as part of this Docker
> agent <https://sematext.com/docker/>.  One could also use Logstash for
> parsing/structuring, but Logstash is a bit heavy.
>
> I hope this helps.
>
> Otis
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
> On Mon, Dec 19, 2016 at 6:03 PM, Zhitao Li  wrote:
>
>> Charles,
>>
>> Thanks for sharing the pattern. If my reading is right, this will extract
>> the entire message line as one string. What I'm looking for is: on top of
>> extracting the entire message line, also break it into structured fields
>> automatically.
>>
>>
>>
>> On Mon, Dec 19, 2016 at 1:59 PM, Charles Allen <
>> charles.al...@metamarkets.com> wrote:
>>
>>> For what its worth we use SumoLogic and the magic parsing search looks
>>> like
>>> this:
>>>
>>> parse regex field=message "^(?[IWE])(?[0-9]{4}
>>> [0-9:.]*) [0-9]*
>>> (?[0-9a-zA-Z.]*):(?[0-9]*)]
>>> (?.*)$"
>>>
>>>
>>>
>>> On Mon, Dec 19, 2016 at 11:15 AM Joris Van Remoortere <
>>> jo...@mesosphere.io>
>>> wrote:
>>>
>>> > @Zhitao are you looking specifically for structure or just for tagging?
>>> > glog does already have support for custom tags in the header. I don't
>>> know
>>> > if this is enough for your use case though.
>>> >
>>> > —
>>> > *Joris Van Remoortere*
>>> > Mesosphere
>>> >
>>> > On Mon, Dec 19, 2016 at 9:58 AM, James Peach  wrote:
>>> >
>>> >
>>> > > On Dec 19, 2016, at 9:43 AM, Zhitao Li 
>>> wrote:
>>> > >
>>> > > Hi,
>>> > >
>>> > > I'm looking at how to better utilize ElasticSearch to perform log
>>> > analysis for logs from Mesos. It seems like ElasticSearch would
>>> generally
>>> > work better for structured logging, but Mesos still uses glog thus all
>>> logs
>>> > produced are old-school unstructured lines.
>>> > >
>>> > > I wonder whether anyone has brought the conversation of making Mesos
>>> > logs easier to process, or if anyone has experience to share.
>>> >
>>> > Are you trying to stitch together sequences of events? I that case,
>>> would
>>> > direct event logging be more useful?
>>> >
>>> > J
>>> >
>>> >
>>> >
>>>
>>
>>
>>
>> --
>> Cheers,
>>
>> Zhitao Li
>>
>
>


-- 
Cheers,

Zhitao Li


Re: Structured logging for Mesos (or c++ glog)

2016-12-19 Thread Zhitao Li
Great.

I also found this old thread
http://search-hadoop.com/m/Mesos/0Vlr6meKs116T2k1?subj=Mapped+diagnostics+context+Adding+internal+Mesos+IDs+as+context+to+the+logs
on
dev list, which seems no consensus has been made.

Maybe we can talk about this in the next community sync?

On Mon, Dec 19, 2016 at 3:25 PM, James Peach  wrote:

>
> > On Dec 19, 2016, at 2:54 PM, Zhitao Li  wrote:
> >
> > Hi James,
> >
> > Stitching events together is only one possible use cases, and I'm not
> exactly sure what you meant by directly event logging.
> >
> > Taking the hierarchical allocator for example. In a multi-framework
> cluster, sometimes I want to comb through various loggings and present a
> trace on how allocation has affected a particular framework (by its
> framework id) and/or w.r.t an agent (by its agent id).
> >
> > Being able to systematically extract structured field values like
> framework_id or agent_id, regardless of the actually logging pattern, will
> be tremendously automatically from all lo valuable in such use cases.
>
> I think we are talking about similar things. Many servers do both
> free-form error logging and structured event logging. I'm thinking of event
> logging formats are customizable by the operator and allow the
> interpolation of context-specific data item (eg. HTTP access logs from many
> different server implementations).
>
> J




-- 
Cheers,

Zhitao Li


Re: Structured logging for Mesos (or c++ glog)

2016-12-19 Thread Zhitao Li
Charles,

Thanks for sharing the pattern. If my reading is right, this will extract
the entire message line as one string. What I'm looking for is: on top of
extracting the entire message line, also break it into structured fields
automatically.



On Mon, Dec 19, 2016 at 1:59 PM, Charles Allen <
charles.al...@metamarkets.com> wrote:

> For what its worth we use SumoLogic and the magic parsing search looks like
> this:
>
> parse regex field=message "^(?[IWE])(?[0-9]{4}
> [0-9:.]*) [0-9]*
> (?[0-9a-zA-Z.]*):(?[0-9]*)]
> (?.*)$"
>
>
>
> On Mon, Dec 19, 2016 at 11:15 AM Joris Van Remoortere  >
> wrote:
>
> > @Zhitao are you looking specifically for structure or just for tagging?
> > glog does already have support for custom tags in the header. I don't
> know
> > if this is enough for your use case though.
> >
> > —
> > *Joris Van Remoortere*
> > Mesosphere
> >
> > On Mon, Dec 19, 2016 at 9:58 AM, James Peach  wrote:
> >
> >
> > > On Dec 19, 2016, at 9:43 AM, Zhitao Li  wrote:
> > >
> > > Hi,
> > >
> > > I'm looking at how to better utilize ElasticSearch to perform log
> > analysis for logs from Mesos. It seems like ElasticSearch would generally
> > work better for structured logging, but Mesos still uses glog thus all
> logs
> > produced are old-school unstructured lines.
> > >
> > > I wonder whether anyone has brought the conversation of making Mesos
> > logs easier to process, or if anyone has experience to share.
> >
> > Are you trying to stitch together sequences of events? I that case, would
> > direct event logging be more useful?
> >
> > J
> >
> >
> >
>



-- 
Cheers,

Zhitao Li


Re: Structured logging for Mesos (or c++ glog)

2016-12-19 Thread Zhitao Li
Joris,

I am particular looking for structure. We have mechanism to add static tags
easily to log collected into ELK.

If there is a way to dynamically inject tags like "framework_id" at actual
logging call, it might be a starting point for me.

I cannot find a good reference on how to add tagging for glog though. Do
you have any reference?

On Mon, Dec 19, 2016 at 11:15 AM, Joris Van Remoortere 
wrote:

> @Zhitao are you looking specifically for structure or just for tagging?
> glog does already have support for custom tags in the header. I don't know
> if this is enough for your use case though.
>
> —
> *Joris Van Remoortere*
> Mesosphere
>
> On Mon, Dec 19, 2016 at 9:58 AM, James Peach  wrote:
>
> >
> > > On Dec 19, 2016, at 9:43 AM, Zhitao Li  wrote:
> > >
> > > Hi,
> > >
> > > I'm looking at how to better utilize ElasticSearch to perform log
> > analysis for logs from Mesos. It seems like ElasticSearch would generally
> > work better for structured logging, but Mesos still uses glog thus all
> logs
> > produced are old-school unstructured lines.
> > >
> > > I wonder whether anyone has brought the conversation of making Mesos
> > logs easier to process, or if anyone has experience to share.
> >
> > Are you trying to stitch together sequences of events? I that case, would
> > direct event logging be more useful?
> >
> > J
>



-- 
Cheers,

Zhitao Li


Re: Structured logging for Mesos (or c++ glog)

2016-12-19 Thread Zhitao Li
Hi James,

Stitching events together is only one possible use cases, and I'm not
exactly sure what you meant by directly event logging.

Taking the hierarchical allocator for example. In a multi-framework
cluster, sometimes I want to comb through various loggings and present a
trace on how allocation has affected a particular framework (by its
framework id) and/or w.r.t an agent (by its agent id).

Being able to systematically extract structured field values like
framework_id or agent_id, regardless of the actually logging pattern, will
be tremendously automatically from all lo valuable in such use cases.

On Mon, Dec 19, 2016 at 9:58 AM, James Peach  wrote:

>
> > On Dec 19, 2016, at 9:43 AM, Zhitao Li  wrote:
> >
> > Hi,
> >
> > I'm looking at how to better utilize ElasticSearch to perform log
> analysis for logs from Mesos. It seems like ElasticSearch would generally
> work better for structured logging, but Mesos still uses glog thus all logs
> produced are old-school unstructured lines.
> >
> > I wonder whether anyone has brought the conversation of making Mesos
> logs easier to process, or if anyone has experience to share.
>
> Are you trying to stitch together sequences of events? I that case, would
> direct event logging be more useful?
>
> J




-- 
Cheers,

Zhitao Li


Structured logging for Mesos (or c++ glog)

2016-12-19 Thread Zhitao Li
Hi,

I'm looking at how to better utilize ElasticSearch to perform log analysis
for logs from Mesos. It seems like ElasticSearch would generally work
better for structured logging, but Mesos still uses glog thus all logs
produced are old-school unstructured lines.

I wonder whether anyone has brought the conversation of making Mesos logs
easier to process, or if anyone has experience to share.

Thanks!

-- 
Cheers,

Zhitao Li


Re: Welcome Haosdent Huang as Mesos Committer and PMC member!

2016-12-16 Thread Zhitao Li
Congrats Haosdent! Well deserved!

So glad and honored to work with you! Very impressed with your amount of
contribution on many tasks.



On Fri, Dec 16, 2016 at 10:59 AM, Vinod Kone  wrote:

> Hi folks,
>
> Please join me in formally welcoming Haosdent Huang as Mesos Committer and
> PMC member.
>
> Haosdent has been an active contributor to the project for more than a year
> now. He has contributed a number of patches and features to the Mesos code
> base, most notably the unified cgroups isolator and health check
> improvements. The most impressive thing about him is that he always
> volunteers to help out people in the community, be it on slack/IRC or
> mailing lists. The fact that he does all this even though working on Mesos
> is not part of his day job is even more impressive.
>
> Here is his more formal checklist
> <https://docs.google.com/document/d/1wq-M4KoMOJWZTNTN-
> hvy-H8ZGLXG6CF9VP2IY_UU5_0/edit?ts=57e0029d>
> for your perusal.
>
> Thanks,
> Vinod
>
> P.S: Sorry for the delay in sending the welcome email.
>



-- 
Cheers,

Zhitao Li


Re: mesos cpuset isolator module available

2016-12-15 Thread Zhitao Li
Thanks for sharing. This is very interesting to us because we are also
looking for solution for latency sensitive CPU isolation.

On Thu, Dec 15, 2016 at 9:56 AM, ct clmsn  wrote:

> I'll add in BUILD instructions tonight/this weekend. I'll be releasing
> some performance counter tools to use in a mesos system (for container
> applications) very soon.
>
> On Thu, Dec 15, 2016 at 12:13 PM, tommy xiao  wrote:
>
>> thanks for your sharing. v5
>>
>> 2016-12-15 23:40 GMT+08:00 ct clmsn :
>>
>>> I've completed a mesos module to support cgroups cpusets. This work is
>>> related to a JIRA ticket that I posted last spring (MESOS-5342). Apologies
>>> for the long delay wrapping up the implementation.
>>>
>>> https://github.com/ct-clmsn/mesos-cpusets
>>>
>>> If you test it out, have issues, or want to make improvements, please
>>> post to github - I've done some very simple/trivial testing.
>>>
>>> Chris
>>>
>>
>>
>>
>> --
>> Deshi Xiao
>> Twitter: xds2000
>> E-mail: xiaods(AT)gmail.com
>>
>
>


-- 
Cheers,

Zhitao Li


Re: [VOTE] Release Apache Mesos 1.1.0 (rc1)

2016-10-21 Thread Zhitao Li
t for HTTP and HTTPS health
>> checks.
>> Executors may now use the updated `HealthCheck` protobuf to implement
>> HTTP(S) health checks. Both default executors (command and docker)
>> leverage
>> `curl` binary for sending HTTP(S) requests and connect to `127.0.0.1`,
>> hence a task must listen on all interfaces. On Linux, For BRIDGE and
>> USER
>> modes, docker executor enters the task's network namespace.
>>
>>   * [MESOS-3421] - **Experimental** Support sharing of resources across
>> containers. Currently persistent volumes are the only resources
>> allowed to
>> be shared.
>>
>>   * [MESOS-3567] - **Experimental** support for TCP health checks.
>> Executors
>> may now use the updated `HealthCheck` protobuf to implement TCP health
>> checks. Both default executors (command and docker) connect to
>> `127.0.0.1`,
>> hence a task must listen on all interfaces. On Linux, For BRIDGE and
>> USER
>> modes, docker executor enters the task's network namespace.
>>
>>   * [MESOS-4324] - Allow access to persistent volumes as read-only or
>> read-write
>> by tasks. Mesos doesn't allow persistent volumes to be created as
>> read-only
>> but in 1.1 it starts allow tasks to use the volumes as read-only.
>> This is
>> mainly motivated by shared persistent volumes but applies to regular
>> persistent volumes as well.
>>
>>   * [MESOS-5275] - **Experimental** support for linux capabilities.
>> Frameworks
>> or operators now have fine-grained control over the capabilities that
>> a
>> container may have. This allows a container to run as root, but not
>> have all
>> the privileges associated with the root user (e.g., CAP_SYS_ADMIN).
>>
>>   * [MESOS-5344] -- **Experimental** support for partition-aware Mesos
>> frameworks. In previous Mesos releases, when an agent is partitioned
>> from
>> the master and then reregisters with the cluster, all tasks running
>> on the
>> agent are terminated and the agent is shutdown. In Mesos 1.1,
>> partitioned
>> agents will no longer be shutdown when they reregister with the
>> master. By
>> default, tasks running on such agents will still be killed (for
>> backward
>> compatibility); however, frameworks can opt-in to the new
>> PARTITION_AWARE
>> capability. If they do this, their tasks will not be killed when a
>> partition
>> is healed. This allows frameworks to define their own policies for
>> how to
>> handle partitioned tasks. Enabling the PARTITION_AWARE capability also
>> introduces a new set of task states: TASK_UNREACHABLE, TASK_DROPPED,
>> TASK_GONE, TASK_GONE_BY_OPERATOR, and TASK_UNKNOWN. These new states
>> are
>> intended to eventually replace the TASK_LOST state.
>>
>>   * [MESOS-6077] - **Experimental** A new default executor is introduced
>> which
>> frameworks can use to launch task groups as nested containers. All the
>> nested containers share resources likes cpu, memory, network and
>> volumes.
>>
>>   * [MESOS-6014] - **Experimental** A new port-mapper CNI plugin, the
>> `mesos-cni-port-mapper` has been introduced. For Mesos containers,
>> with the
>> CNI port-mapper plugin, users can now expose container ports through
>> host
>> ports using DNAT. This is especially useful when Mesos containers are
>> attached to isolated CNI networks such as private bridge networks,
>> and the
>> services running in the container needs to be exposed outside these
>> isolated networks.
>>
>>
>> The CHANGELOG for the release is available at:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_p
>> lain;f=CHANGELOG;hb=1.1.0-rc1
>> 
>> 
>>
>> The candidate for Mesos 1.1.0 release is available at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.1.0-rc1/mesos-1.1.0.tar.gz
>>
>> The tag to be voted on is 1.1.0-rc1:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.1.0-rc1
>>
>> The MD5 checksum of the tarball can be found at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.1.0-rc1/mesos
>> -1.1.0.tar.gz.md5
>>
>> The signature of the tarball can be found at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.1.0-rc1/mesos
>> -1.1.0.tar.gz.asc
>>
>> The PGP key used to sign the release is here:
>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>
>> The JAR is up in Maven in a staging repository here:
>> https://repository.apache.org/content/repositories/orgapachemesos-1158
>>
>> Please vote on releasing this package as Apache Mesos 1.1.0!
>>
>> The vote is open until Fri Oct 21 21:57:02 CEST 2016 and passes if a
>> majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Mesos 1.1.0
>> [ ] -1 Do not release this package because ...
>>
>> Thanks,
>> Alex & Till
>>
>>
>


-- 
Cheers,

Zhitao Li


Re: Performance regression in v1 api vs v0

2016-10-17 Thread Zhitao Li
gt; Dario
>>>>
>>>> On Oct 16, 2016, at 11:01 AM, Anand Mazumdar  wrote:
>>>>
>>>> Dario,
>>>>
>>>> Thanks for reporting this. Did you test this with 1.0 or the recent
>>>> HEAD? We had done performance testing prior to 1.0rc1 and had not found any
>>>> substantial discrepancy on the call ingestion path. Hence, we had focussed
>>>> on fixing the performance issues around writing events on the stream in
>>>> MESOS-5222 <https://issues.apache.org/jira/browse/MESOS-5222> and
>>>> MESOS-5457 <https://issues.apache.org/jira/browse/MESOS-5457>.
>>>>
>>>> The numbers in the benchmark test pointed by Haosdent (v0 vs v1) differ
>>>> due to the slowness of the client (scheduler library) in processing the
>>>> status update events. We should add another benchmark that measures just
>>>> the time taken by the master to write the events. I would file an issue
>>>> shortly to address this.
>>>>
>>>> Do you mind filing an issue with more details on your test setup?
>>>>
>>>> -anand
>>>>
>>>> On Sun, Oct 16, 2016 at 12:05 AM, Dario Rexin  wrote:
>>>>
>>>>> Hi haosdent,
>>>>>
>>>>> thanks for the pointer! Your results show exactly what I’m
>>>>> experiencing. I think especially for bigger clusters this could be very
>>>>> problematic. It would be great to get some input from the folks working on
>>>>> the HTTP API, especially Anand.
>>>>>
>>>>> Thanks,
>>>>> Dario
>>>>>
>>>>> On Oct 16, 2016, at 12:01 AM, haosdent  wrote:
>>>>>
>>>>> Hmm, this is an interesting topic. @anandmazumdar create a benchmark
>>>>> test case to compare v1 and v0 APIs before. You could run it via
>>>>>
>>>>> ```
>>>>> ./bin/mesos-tests.sh --benchmark --gtest_filter="*SchedulerReco
>>>>> ncileTasks_BENCHMARK_Test*"
>>>>> ```
>>>>>
>>>>> Here is the result that run it in my machine.
>>>>>
>>>>> ```
>>>>> [ RUN  ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerLibrary/0
>>>>> Reconciling 1000 tasks took 386.451108ms using the scheduler library
>>>>> [   OK ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerLibrary/0 (479 ms)
>>>>> [ RUN  ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerLibrary/1
>>>>> Reconciling 1 tasks took 3.389258444secs using the scheduler
>>>>> library
>>>>> [   OK ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerLibrary/1 (3435 ms)
>>>>> [ RUN  ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerLibrary/2
>>>>> Reconciling 5 tasks took 16.624603964secs using the scheduler
>>>>> library
>>>>> [   OK ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerLibrary/2 (16737 ms)
>>>>> [ RUN  ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerLibrary/3
>>>>> Reconciling 10 tasks took 33.134018718secs using the scheduler
>>>>> library
>>>>> [   OK ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerLibrary/3 (3 ms)
>>>>> [ RUN  ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerDriver/0
>>>>> Reconciling 1000 tasks took 24.212092ms using the scheduler driver
>>>>> [   OK ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerDriver/0 (89 ms)
>>>>> [ RUN  ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerDriver/1
>>>>> Reconciling 1 tasks took 316.115078ms using the scheduler driver
>>>>> [   OK ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerDriver/1 (385 ms)
>>>>> [ RUN  ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerDriver/2
>>>>> Reconciling 5 tasks took 1.239050154secs using the scheduler driver
>>>>> [   OK ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerDriver/2 (1379 ms)
>>>>> [ RUN  ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerDriver/3
>>>>> Reconciling 10 tasks took 2.38445672secs using the scheduler driver
>>>>> [   OK ] Tasks/SchedulerReconcileTasks_
>>>>> BENCHMARK_Test.SchedulerDriver/3 (2711 ms)
>>>>> ```
>>>>>
>>>>> *SchedulerLibrary* is the HTTP API, *SchedulerDriver* is the old way
>>>>> based on libmesos.so.
>>>>>
>>>>> On Sun, Oct 16, 2016 at 2:41 PM, Dario Rexin  wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I recently did some performance testing on the v1 scheduler API and
>>>>>> found that throughput is around 10x lower than for the v0 API. Using 1
>>>>>> connection, I don’t get a lot more than 1,500 calls per second, where the
>>>>>> v0 API can do ~15,000. If I use multiple connections, throughput maxes 
>>>>>> out
>>>>>> at 3 connections and ~2,500 calls / s. If I add any more connections, the
>>>>>> throughput per connection drops and the total throughput stays around
>>>>>> ~2,500 calls / s. Has anyone done performance testing on the v1 API 
>>>>>> before?
>>>>>> It seems a little strange to me, that it’s so much slower, given that the
>>>>>> v0 API also uses HTTP (well, more or less). I would be thankful for any
>>>>>> comments and experience reports of other users.
>>>>>>
>>>>>> Thanks,
>>>>>> Dario
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Deshi Xiao
>>> Twitter: xds2000
>>> E-mail: xiaods(AT)gmail.com
>>>
>>
>>
>>
>> --
>> Anand Mazumdar
>>
>>
>>
>


-- 
Cheers,

Zhitao Li


Re: Non-checkpointing frameworks

2016-10-17 Thread Zhitao Li
+1 to both A to B.

Do we plan to eventually drop non-checkpionted framework support (possibly
in v2) and declare that all frameworks has to operate in this assumption?

On Mon, Oct 17, 2016 at 1:36 AM, Aaron Carey  wrote:

> +1 to A and B
>
> Aaron Carey
> Production Engineer - Cloud Pipeline
> Industrial Light & Magic
> London
> 020 3751 9150
>
>
> On 17 October 2016 at 00:38, Qian Zhang  wrote:
>
>> and requires operators to enable checkpointing on the slaves.
>>
>>
>> Just curious why operator needs to enable checkpointing on the slaves (I
>> do not see an agent flag for that), I think checkpointing should be enabled
>> in framework level rather than slave.
>>
>>
>> Thanks,
>> Qian Zhang
>>
>> On Sun, Oct 16, 2016 at 10:18 AM, Zameer Manji  wrote:
>>
>>> +1 to A and B
>>>
>>> Aurora has enabled checkpointing for years and requires operators to
>>> enable
>>> checkpointing on the slaves.
>>>
>>> On Sat, Oct 15, 2016 at 11:57 AM, Joris Van Remoortere <
>>> jo...@mesosphere.io>
>>> wrote:
>>>
>>> > I'm in favor of A & B. I find it provides a better "first experience"
>>> to
>>> > users.
>>> > From my experience you usually have to have an explicit reason to not
>>> want
>>> > to checkpoint. Most people assume the semantics provided by the
>>> checkpoint
>>> > behavior is default and it can be a frustrating experience for them to
>>> find
>>> > out that is not the case.
>>> >
>>> > —
>>> > *Joris Van Remoortere*
>>>
>>> > Mesosphere
>>> >
>>> > On Fri, Oct 14, 2016 at 3:11 PM, Neil Conway 
>>> > wrote:
>>> >
>>> >> Hi folks,
>>> >>
>>> >> I'd like input from individuals who currently use frameworks but do
>>> >> not enable checkpointing.
>>> >>
>>> >> Background: "checkpointing" is a parameter that can be enabled in
>>> >> FrameworkInfo; if enabled, the agent will write the framework pid,
>>> >> executor PIDs, and status updates to disk for any tasks started by
>>> >> that framework. This checkpointed information means that these tasks
>>> >> can survive an agent crash: if the agent exits (whether due to
>>> >> crashing or as part of an upgrade procedure), a restarted agent can
>>> >> use this information to reconnect to executors started by the previous
>>> >> instance of the agent. The downside is that checkpointing requires
>>> >> some additional disk I/O at the agent.
>>> >>
>>> >> Checkpointing is not currently the default, but in my experience it is
>>> >> often enabled for production frameworks. As part of the work on
>>> >> supporting partition-aware Mesos frameworks (see MESOS-4049), we are
>>> >> considering:
>>> >>
>>> >> (a) requiring that partition-aware frameworks must also enable
>>> >> checkpointing, and/or
>>> >> (b) enabling checkpointing by default
>>> >>
>>> >> If you have intentionally decided to disable checkpointing for your
>>> >> Mesos framework, I'd be curious to hear more about your use-case and
>>> >> why you haven't enabled it.
>>> >>
>>> >> Thanks!
>>> >>
>>> >> Neil
>>> >>
>>> >> --
>>> >> Zameer Manji
>>> >>
>>> >
>>>
>>
>>
>


-- 
Cheers,

Zhitao Li


Re: How many roles are we supported?

2016-09-08 Thread Zhitao Li
I'll share some of our targets which we aim to support per Mesos cluster,
which may not be representative:
- up to about 100 roles;
- up to low hundreds of frameworks;
- up to low tens of thousands of agents.

On Thu, Sep 8, 2016 at 12:42 AM, Klaus Ma  wrote:

> any suggestion?
>
> On Wed, Sep 7, 2016 at 11:35 AM Klaus Ma  wrote:
>
>> + user@
>>
>>
>> On Wed, Sep 7, 2016 at 11:31 AM Klaus Ma  wrote:
>>
>>> IMO, it does not make sense to let user to try it :). It's better for us
>>> (Mesos Dev) to provide suggestion :).
>>>
>>> On Wed, Sep 7, 2016 at 11:27 AM Zhitao Li  wrote:
>>>
>>>> I think polling user group for how people uses or plan to use Mesos will
>>>> help.
>>>>
>>>> I personally already know at least two different ways of modeling
>>>> multiple
>>>> workloads to roles and frameworks in Mesos, which results in quite
>>>> different numbers for roles and frameworks even for similar sized
>>>> cluster.
>>>>
>>>> On Tue, Sep 6, 2016 at 7:54 PM, Klaus Ma 
>>>> wrote:
>>>>
>>>> > Question on Mesos's scalability of 1.0: how many roles are we going to
>>>> > support? how many nodes are we going to support? how many frameworks
>>>> are we
>>>> > going to support? ...
>>>> >
>>>> > When using Mesos as resource manager, those info is important to us
>>>> when
>>>> > proposing solution.
>>>> >
>>>> > And in community, it's better for us to have a target for performance
>>>> > related project; it takes time to keeping improving the performance
>>>> :).
>>>> >
>>>> > Thanks
>>>> > Klaus
>>>> > --
>>>> >
>>>> > Regards,
>>>> > 
>>>> > Da (Klaus), Ma (马达), PMP® | Software Architect
>>>> > IBM Platform Development & Support, STG, IBM GCG
>>>> > +86-10-8245 4084 | mad...@cn.ibm.com | http://k82.me
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Cheers,
>>>>
>>>> Zhitao Li
>>>>
>>> --
>>>
>>> Regards,
>>> 
>>> Da (Klaus), Ma (马达), PMP® | Software Architect
>>> IBM Platform Development & Support, STG, IBM GCG
>>> +86-10-8245 4084 | mad...@cn.ibm.com | http://k82.me
>>>
>> --
>>
>> Regards,
>> 
>> Da (Klaus), Ma (马达), PMP® | Software Architect
>> IBM Platform Development & Support, STG, IBM GCG
>> +86-10-8245 4084 | mad...@cn.ibm.com | http://k82.me
>>
> --
>
> Regards,
> 
> Da (Klaus), Ma (马达), PMP® | Software Architect
> IBM Platform Development & Support, STG, IBM GCG
> +86-10-8245 4084 | mad...@cn.ibm.com | http://k82.me
>



-- 
Cheers,

Zhitao Li


Re: [VOTE] Release Apache Mesos 1.0.1 (rc1)

2016-08-15 Thread Zhitao Li
+1 (nonbinding)

Deployed a build to smaller testing cluster and no issue is found.

On Mon, Aug 15, 2016 at 11:15 AM, Yan Xu  wrote:

> +1 (binding)
>
> Ran make check on macOS 10.11.5 and clang-703.0.31.
> Additionally, although not rigorous enough as a proof, we deployed a
> version of head (> 1.0) that includes fixes in this release and it's
> working fine (checked that webUI redirect worked and our test workloads
> ran).
>
> Yan
>
> On Aug 13, 2016, at 6:38 AM, haosdent  wrote:
>
> +1 (non-binding)
>
> Run `sudo make check` on CentOS 7.2 and Ubuntu 14.04
>
> On Sat, Aug 13, 2016 at 6:07 AM, Kapil Arya  wrote:
>
>> +1 (binding)
>>
>> You can find the rpm/deb packages here:
>>   http://open.mesosphere.com/downloads/mesos-rc/#apache-mesos-1.0.1-rc1
>>
>> The following docker tags (built off of ubuntu 14.04) are also available:
>> mesosphere/mesos:1.0.1-rc1
>> mesosphere/mesos-master:1.0.1-rc1
>> mesosphere/mesos-slave:1.0.1-rc1
>>
>> Kapil
>>
>> On Fri, Aug 12, 2016 at 4:39 PM, Alex Rukletsov 
>> wrote:
>>
>>> +1 (binding)
>>>
>>> make check on Mac OS 10.11.6 with apple clang-703.0.31.
>>>
>>> DockerFetcherPluginTest.INTERNET_CURL_FetchImage is flaky (MESOS-4570),
>>> but
>>> this does not seem to be a regression or a blocker.
>>>
>>> On Fri, Aug 12, 2016 at 10:30 PM, Radoslaw Gruchalski <
>>> ra...@gruchalski.com>
>>> wrote:
>>>
>>> > I am trying to build Mesos 1.0.1 for Centos 7 in a Docker container but
>>> > I'm hitting this: https://issues.apache.org/jira/browse/MESOS-5925.
>>> >
>>> > Kind regards,
>>> >
>>> > Radek Gruchalski
>>> > ra...@gruchalski.com
>>> > +4917685656526
>>> >
>>> > *Confidentiality:*
>>> > This communication is intended for the above-named person and may be
>>> > confidential and/or legally privileged.
>>> > If it has come to you in error you must take no action based on it, nor
>>> > must you copy or show it to anyone; please delete/destroy and inform
>>> the
>>> > sender immediately.
>>> >
>>> > On Thu, Aug 11, 2016 at 2:32 AM, Vinod Kone 
>>> wrote:
>>> >
>>> >> Hi all,
>>> >>
>>> >>
>>> >> Please vote on releasing the following candidate as Apache Mesos
>>> 1.0.1.
>>> >>
>>> >>
>>> >> The CHANGELOG for the release is available at:
>>> >>
>>> >> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_p
>>> >> lain;f=CHANGELOG;hb=1.0.1-rc1
>>> >>
>>> >> 
>>> >> 
>>> >>
>>> >>
>>> >> The candidate for Mesos 1.0.1 release is available at:
>>> >>
>>> >> https://dist.apache.org/repos/dist/dev/mesos/1.0.1-rc1/mesos
>>> -1.0.1.tar.gz
>>> >>
>>> >>
>>> >> The tag to be voted on is 1.0.1-rc1:
>>> >>
>>> >> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit
>>> ;h=1.0.1-rc1
>>> >>
>>> >>
>>> >> The MD5 checksum of the tarball can be found at:
>>> >>
>>> >> https://dist.apache.org/repos/dist/dev/mesos/1.0.1-rc1/mesos
>>> >> -1.0.1.tar.gz.md5
>>> >>
>>> >>
>>> >> The signature of the tarball can be found at:
>>> >>
>>> >> https://dist.apache.org/repos/dist/dev/mesos/1.0.1-rc1/mesos
>>> >> -1.0.1.tar.gz.asc
>>> >>
>>> >>
>>> >> The PGP key used to sign the release is here:
>>> >>
>>> >> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>> >>
>>> >>
>>> >> The JAR is up in Maven in a staging repository here:
>>> >>
>>> >> https://repository.apache.org/content/repositories/orgapache
>>> mesos-1155
>>> >>
>>> >>
>>> >> Please vote on releasing this package as Apache Mesos 1.0.1!
>>> >>
>>> >>
>>> >> The vote is open until Mon Aug 15 17:29:33 PDT 2016 and passes if a
>>> >> majority of at least 3 +1 PMC votes are cast.
>>> >>
>>> >>
>>> >> [ ] +1 Release this package as Apache Mesos 1.0.1
>>> >>
>>> >> [ ] -1 Do not release this package because ...
>>> >>
>>> >>
>>> >> Thanks,
>>> >>
>>> >
>>> >
>>>
>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>
>
>


-- 
Cheers,

Zhitao Li


Re: Using mesos' cfs limits on a docker container?

2016-08-15 Thread Zhitao Li
Hi Mark,

About the curl issue on SSL, can you please check
https://issues.apache.org/jira/browse/MESOS-6005 is similar to what you see?

On Sun, Aug 14, 2016 at 12:23 PM, Artem Harutyunyan 
wrote:

> Hi Mark,
>
> Good to hear you figured it out. Can you please post curl errors that you
> were observing and describe your image repository setup? I'd like to make
> sure that we have instructions on how to mitigate those.
>
> Artem.
>
> On Sunday, August 14, 2016, Mark Hammons 
> wrote:
>
>> In specific, I wanted the process control capabilities of a mesos
>> framework with custom schedulers and executors, but wanted to run my tasks
>> in a framework definable environment (like running my tasks on a copy of
>> Ubuntu 14 with certain libs installed). Using mixed-mode containerization
>> worked with some fiddling, but it was painful in certain ways. The sandbox
>> mounted in a mixed-mode container wasn't accessible from within the
>> container thanks to selinux unless I ran the container in privileged mode
>> and the cou limits per executor were no longer enforced unlike a mesos task
>> with cfs isolation enabled. Further, setting up the default working
>> directory and user was a pain.
>>
>> Unified mode (also called mesos containerizer for some reason) solves a
>> lot of these issues, though using it with private image repositories was
>> not as straightforward as the docker containerizer. I eventually had to use
>> an image directory to get that working, cause curl kept throwing vague ssl
>> errors(I'm fairly certain this is due to my private image repository not
>> having https set up since it's a test environment).
>>
>> Once I get things set up and cleaned up I'll post a more involved guide
>> on how to get this particular use case setup and running, especially a part
>> on preparing your container image for use with mesos.
>>
>> Mark Edgar Hammons II - Research Engineer at BioEmergences
>> 0603695656
>>
>> On 14 Aug 2016, at 18:11, Erik Weathers  wrote:
>>
>> What was the problem and how did you overcome it?  (i.e. This would be a
>> sad resolution to this thread for someone faced with this same problem in
>> the future.)
>>
>> On Sunday, August 14, 2016, Mark Hammons 
>> wrote:
>>
>>> I finally got this working after fiddling with it all night. It works
>>> great so far!
>>>
>>> Mark Edgar Hammons II - Research Engineer at BioEmergences
>>> 0603695656
>>>
>>> On 14 Aug 2016, at 04:50, Joseph Wu  wrote:
>>>
>>> If you're not against running Docker containers without the Docker
>>> daemon, try using the Unified containerizer.
>>> See the latter half of this document: http://mesos.apache.org/docume
>>> ntation/latest/mesos-containerizer/
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__mesos.apache.org_documentation_latest_mesos-2Dcontainerizer_&d=DQMFaQ&c=LNdz7nrxyGFUIUTz2qIULQ&r=cPg4mUupZEtURFK34GyDCtRjHoUmKrI7oHRZqAh3hZY&m=p3yjpxMelmcew1dQtqJniCFVDpbSbJQBXaW-mA1QVHU&s=6sjCv4C-sSI7jwRLgPi2uCrQR8G0D_Kvtde-tRjBybc&e=>
>>>
>>> On Sat, Aug 13, 2016 at 7:02 PM, Mark Hammons <
>>> mark.hamm...@inaf.cnrs-gif.fr> wrote:
>>>
>>>> Hi All,
>>>>
>>>>
>>>>
>>>> I was having a lot of success having mesos force sandboxed programs to
>>>> work within cpu and memory constraints, but when I added docker into the
>>>> mix, the cpu limitations go out the window (not sure about the memory
>>>> limitations. Is there any way to mix these two methods of isolation? I'd
>>>> like my executor/algorithm to run inside a docker container, but have that
>>>> container's memory and cpu usage controlled by systemd/mesos.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mark
>>>> --
>>>>
>>>> Mark Hammons - +33 06 03 69 56 56
>>>>
>>>> Research Engineer @ BioEmergences
>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__bioemergences.iscpif.fr&d=DQMFaQ&c=LNdz7nrxyGFUIUTz2qIULQ&r=cPg4mUupZEtURFK34GyDCtRjHoUmKrI7oHRZqAh3hZY&m=p3yjpxMelmcew1dQtqJniCFVDpbSbJQBXaW-mA1QVHU&s=hlyM8jpFaEkcQ5X8UJs0BTG53J2X6F-zEs0JIKxCFEQ&e=>
>>>>
>>>> Lab Phone: 01 69 82 34 19
>>>>
>>>
>>>


-- 
Cheers,

Zhitao Li


Re: [Mesos 2.0] Let's talk about the future

2016-07-29 Thread Zhitao Li
I think an email here will quickly go confused with each other replying
inline. Do we have a roadmap google doc for Mesos 2.0 that people can
collectively comment and propose items in it?

On Fri, Jul 29, 2016 at 12:37 AM, Olivier Sallou 
wrote:

>
>
> --
>
> *De: *"Jay JN Guo" 
> *À: *"user" , "mesos" 
> *Envoyé: *Vendredi 29 Juillet 2016 09:13:20
> *Objet: *[Mesos 2.0] Let's talk about the future
>
> Hi,
>
> As we are all excited about release 1.0.0, it's never too early to talk
> about next big thing: Mesos 2.0.0. What major things should be done next?
>
> I believe there are still many features you desire in Mesos and some of
> them are already under development. I'd like to collect your minds and
> align the vision in this mail thread. For example, here are items on Mesos
> long term roadmap:
>
> Pluggable Fetcher
> Oversubscription for reservation: Optimistic offers
> Resource Revocation
> Pod support
> Quota chunks
> Multiple-role support for frameworks
> User namespace support
>
> What features do you expect from this?  Is it running a task/container as
> a different user on a per container basis (root in container but seen as
> user X on host)? (as expected in Docker in the future, seems it also need
> linux kernel updates)
>
>
> Event bus
> First class resources (Cpu topology info, GPU topology info, disk speed,
> etc)
>
> there was a quite recent proposal about location awareness (rack etc...)
> which also looks interesting
>
>
> Deprecate Docker containerizer (in favor of Unified containerizer w/
> Docker support)
>
> while this is long term (let's keep people time to switch to unified  ;-)
>  ), deprecation of Docker containerizer should go with support of
> equivalent port mapping over bridge functionality as currently proposed by
> Docker network bridge mode. I know  there is a track in JIRA for this
> feature, but without it, I think that you cannot drop the Docker
> containerizer. CNI plugins on mesos are important  (IP per container), but
> should not be mandatory (more complex to install/setup than pure mesos).
> Indeed, CNI integration is not complete with Mesos or other frameworks (you
> do not fully manage ports of Calico etc... via Mesos, basically you only
> ask an IP for your container, all port rules are managed directly via the
> tool), and current Docker bridge/user mode with Mesos is far more easy to
> setup/use.
>
> Olivier
>
>
>
> I would appreciate it if you could either share your ideas or vote on
> these items, and we will discuss it in next community sync.
>
> We may not have an unshakeable conclusion as container technology is
> evolving at an ever faster pace, but the whole community, especially
> newbies like myself, would profoundly benefit from a clear plan and
> priority for next 3-6 months.
>
> Cheers,
> /Jay
>
>
>


-- 
Cheers,

Zhitao Li


Re: [VOTE] Release Apache Mesos 1.0.0 (rc4)

2016-07-26 Thread Zhitao Li
* [MESOS-4909] - Tasks can now specify a kill policy. They are
> >> > > best-effort,
> >> > >
> >> > > because machine failures or forcible terminations may occur.
> >> > > Currently, the
> >> > >
> >> > > only available kill policy is how long to wait between graceful
> >> and
> >> > > forcible
> >> > >
> >> > > task kill. In the future, more policies may be available (e.g.
> >> > hitting
> >> > > an
> >> > >
> >> > > HTTP endpoint, running a command, etc). Note that it is the
> >> > > executor's
> >> > >
> >> > > responsibility to enforce kill policies. For executor-less
> >> > > command-based
> >> > >
> >> > > tasks, the kill is performed via sending a signal to the task
> >> > > process:
> >> > >
> >> > > SIGTERM for the graceful kill and SIGKILL for the forcible kill.
> >> For
> >> > > docker
> >> > >
> >> > > executor-less tasks the grace period is passed to 'docker stop
> >> > > --time'. This
> >> > >
> >> > > feature supersedes the '--docker_stop_timeout', which is now
> >> > > deprecated.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >   * [MESOS-4908] - The task kill policy defined within 'TaskInfo'
> can
> >> now
> >> > > be
> >> > >
> >> > > overridden when the scheduler kills the task. This can be used
> by
> >> > > schedulers
> >> > >
> >> > > to forcefully kill a task which is already being killed, e.g. if
> >> > > something
> >> > >
> >> > > went wrong during a graceful kill and a forcible kill is
> desired.
> >> > Note
> >> > > that
> >> > >
> >> > > it is the executor's responsibility to honor the
> >> > > 'Event.kill.kill_policy'
> >> > >
> >> > > field and override the task's kill policy and kill policy from a
> >> > > previous
> >> > >
> >> > > kill task request. To use this feature, schedulers and executors
> >> must
> >> > >
> >> > >
> >> > > support HTTP API; use the '--http_command_executor' agent flag
> to
> >> > > ensure
> >> > >
> >> > > the agent launches the HTTP API based command executor.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >   * [MESOS-4949] - The executor shutdown grace period can now be
> >> > > configured in
> >> > >
> >> > > `ExecutorInfo`, which overrides the agent flag. When shutting
> >> down an
> >> > >
> >> > >
> >> > > executor the agent will wait in a best-effort manner for the
> grace
> >> > > period
> >> > >
> >> > > specified here before forcibly destroying the container. The
> >> executor
> >> > > must
> >> > >
> >> > > not assume that it will always be allotted the full grace
> period,
> >> as
> >> > > the
> >> > >
> >> > > agent may decide to allot a shorter period and failures /
> forcible
> >> > >
> >> > >
> >> > > terminations may occur. Together with kill policies this gives
> >> > > frameworks
> >> > >
> >> > > flexibility around how to clean up tasks and executors.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >   * [MESOS-3094] - **Experimental** support for launching mesos
> tasks
> >> on
> >> > >
> >> > >
> >> > > Windows. Note that there are no isolation guarantees provided
> yet.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >   * [MESOS-4090] - The `mesos.native` python module has been split
> >> into
> >> > > two,
> >> > >
> >> > > `mesos.executor` and `mesos.scheduler`. This change also removes
> >> > >
> >> > >
> >> > > un-necessary 3rd party dependencies from `mesos.executor` and
> >> > >
> >> > >
> >> > > `mesos.scheduler`. `mesos.native` still exists, combining both
> >> > modules
> >> > > for
> >> > >
> >> > > backwards compatibility with existing code.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >   * [MESOS-1478] - Phase I of the Slave to Agent rename is complete.
> >> To
> >> > > support
> >> > >
> >> > > the rename, new duplicate flags (e.g.,
> >> --agent_reregister_timeout),
> >> > > new
> >> > >
> >> > >   * [MESOS-1478] - Phase I of the Slave to Agent rename is complete.
> >> To
> >> > > support
> >> > >
> >> > > the rename, new duplicate flags (e.g.,
> >> --agent_reregister_timeout),
> >> > > new
> >> > >
> >> > > binaries (e.g., mesos-agent) and WebUI sandbox links have been
> >> added.
> >> > > All
> >> > >
> >> > > the logging output has been updated to use the term 'agent' now.
> >> > > Flags,
> >> > >
> >> > > binaries and scripts with 'slave' keyword have been deprecated
> >> (see
> >> > >
> >> > >
> >> > > "Deprecations section below").
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >   * [MESOS-4312] - **Experimental** support for building and running
> >> > mesos
> >> > > on
> >> > >
> >> > > IBM PowerPC platform.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >   * [MESOS-4189] - Weights for resource roles can now be configured
> >> > > dynamically
> >> > >
> >> > > via the new '/weights' endpoint on the master.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >   * [MESOS-4424] - Support for using Nvidia GPUs as a resource in
> the
> >> > >
> >> > >
> >> > > Mesos "unified" containerizer. This support includes running
> >> > > containers
> >> > >
> >> > > with and without filesystem isolation (i.e. running both
> imageless
> >> > >
> >> > >
> >> > > containers as well as containers using a docker image).
> Frameworks
> >> > > must
> >> > >
> >> > > opt-in to receiving GPU resources via the GPU_RESOURCES
> framework
> >> > >
> >> > >
> >> > > capability (see the scarce resource problem in MESOS-5377). We
> >> > > support
> >> > >
> >> > > 'nvidia-docker'-style docker containers by injecting a volume
> that
> >> > >
> >> > >
> >> > > contains the Nvidia libraries / binaries when the docker image
> has
> >> > >
> >> > >
> >> > > the 'com.nvidia.volumes.needed' label. Support for the docker
> >> > >
> >> > >
> >> > > containerizer will come in a future release.
> >> > >
> >> > >
> >> > >
> >> > >   * [MESOS-5724] - SSL certificate validation allows for additional
> IP
> >> > > address
> >> > >
> >> > > subject alternative name extension verification.
> >> > >
> >> > > The CHANGELOG for the release is available at:
> >> > >
> >> > >
> >> > >
> >> >
> >>
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.0.0-rc4
> >> > >
> >> > >
> >> > >
> >> >
> >>
> 
> >> > >
> >> > >
> >> > > The candidate for Mesos 1.0.0 release is available at:
> >> > >
> >> > >
> >> >
> >>
> https://dist.apache.org/repos/dist/dev/mesos/1.0.0-rc4/mesos-1.0.0.tar.gz
> >> > >
> >> > >
> >> > > The tag to be voted on is 1.0.0-rc4:
> >> > >
> >> > >
> >>
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.0.0-rc4
> >> > >
> >> > >
> >> > > The MD5 checksum of the tarball can be found at:
> >> > >
> >> > >
> >> > >
> >> >
> >>
> https://dist.apache.org/repos/dist/dev/mesos/1.0.0-rc4/mesos-1.0.0.tar.gz.md5
> >> > >
> >> > >
> >> > > The signature of the tarball can be found at:
> >> > >
> >> > >
> >> > >
> >> >
> >>
> https://dist.apache.org/repos/dist/dev/mesos/1.0.0-rc4/mesos-1.0.0.tar.gz.asc
> >> > >
> >> > >
> >> > > The PGP key used to sign the release is here:
> >> > >
> >> > > https://dist.apache.org/repos/dist/release/mesos/KEYS
> >> > >
> >> > >
> >> > > The JAR is up in Maven in a staging repository here:
> >> > >
> >> > >
> >> https://repository.apache.org/content/repositories/orgapachemesos-1153
> >> > >
> >> > >
> >> > > Please vote on releasing this package as Apache Mesos 1.0.0!
> >> > >
> >> > >
> >> > > [ ] +1 Release this package as Apache Mesos 1.0.0
> >> > >
> >> > > [ ] -1 Do not release this package because ...
> >> > >
> >> > >
> >> > > Thanks,
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Best Regards,
> >> Haosdent Huang
> >>
> >
> >
>



-- 
Cheers,

Zhitao Li


Re: Enabling basic access authentication

2016-07-12 Thread Zhitao Li
Just went through this: I think the necessary endpoint `/master/state` is
only authenticated after 1.0.0, which is still going through release vote.

Can you share which version of Mesos you are running?

On Tue, Jul 12, 2016 at 5:18 PM, Douglas Nelson  wrote:

> With marathon you can enable basic access authentication to the WebUI with
> the flag --http_credentials.
>
> I expected something similar with the flag --authenticate_http in mesos
> but when I hit the WebUI I'm not prompted to give a username/pass. Is that
> feature not included in mesos or is there a different configuration I need
> to set?
>
> Thanks!
>



-- 
Cheers,

Zhitao Li


Re: [VOTE] Release Apache Mesos 1.0.0 (rc2)

2016-07-12 Thread Zhitao Li
+1 (nonbinding)

Tested by 1)running all tests on Mac OS, 2) perform upgrade and downgrade
on a small test cluster for both master and slave.



On Mon, Jul 11, 2016 at 10:13 AM, Kapil Arya  wrote:

> None of the stable builds have SSL yet. The first SSL-enabled stable build
> will be 1.0.0. Sorry for the confusion.
>
> Kapil
>
> On Mon, Jul 11, 2016 at 1:03 PM, Zhitao Li  wrote:
>
> > Hi Kapil,
> >
> > Do you mean that the stable builds from
> > http://open.mesosphere.com/downloads/mesos is using the new
> configuration?
> >
> > On Sun, Jul 10, 2016 at 10:07 AM, Kapil Arya 
> wrote:
> >
> >> The binary rpm/deb packages can be found here:
> >>
> http://open.mesosphere.com/downloads/mesos-rc/#apache-mesos-1.0.0-rc2
> >> .
> >>
> >> Please note that starting with the 1.0.0 release (including RCs and
> >> recent nightly builds), Mesos is configured with SSL and 3rdparty
> >> module dependency installation. Here is the configure command line:
> >> ./configure --enable-libevent --enable-ssl
> >> --enable-install-module-dependencies
> >>
> >> As always, the stable builds are available at:
> >> http://open.mesosphere.com/downloads/mesos
> >>
> >> The instructions for nightly builds are available at:
> >> http://open.mesosphere.com/downloads/mesos-nightly/
> >>
> >> Best,
> >> Kapil
> >>
> >>
> >> On Thu, Jul 7, 2016 at 9:35 PM, Vinod Kone 
> wrote:
> >> >
> >> > Hi all,
> >> >
> >> >
> >> > Please vote on releasing the following candidate as Apache Mesos
> 1.0.0.
> >> >
> >> >
> >> > 1.0.0 includes the following:
> >> >
> >> >
> >>
> 
> >> >
> >> >   * Scheduler and Executor v1 HTTP APIs are now considered stable.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >   * [MESOS-4791] - **Experimental** support for v1 Master and Agent
> >> APIs.
> >> > These
> >> >
> >> > APIs let operators and services (monitoring, load balancers) send
> >> HTTP
> >> >
> >> >
> >> > requests to '/api/v1' endpoint on master or agent. See
> >> >
> >> >
> >> > `docs/operator-http-api.md` for details.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >   * [MESOS-4828] - **Experimental** support for a new `disk/xfs'
> >> isolator
> >> >
> >> >
> >> > has been added to isolate disk resources more efficiently. Please
> >> refer
> >> > to
> >> >
> >> > docs/mesos-containerizer.md for more details.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >   * [MESOS-4355] - **Experimental** support for Docker volume plugin.
> We
> >> > added a
> >> >
> >> > new isolator 'docker/volume' which allows users to use external
> >> volumes
> >> > in
> >> >
> >> > Mesos containerizer. Currently, the isolator interacts with the
> >> Docker
> >> >
> >> >
> >> > volume plugins using a tool called 'dvdcli'. By speaking the
> Docker
> >> > volume
> >> >
> >> > plugin API, most of the Docker volume plugins are supported.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >   * [MESOS-4641] - **Experimental** A new network isolator, the
> >> >
> >> >
> >> > `network/cni` isolator, has been introduced in the
> >> > `MesosContainerizer`. The
> >> >
> >> > `network/cni` isolator implements the Container Network Interface
> >> (CNI)
> >> >
> >> >
> >> > specification proposed by CoreOS.  With CNI the `network/cni`
> >> isolator
> >> > is
> >> >
> >> > able to allocate a network namespace to Mesos containers and
> attach
> >> the
> >> >
> >> >
> >> > container to different types of IP networks by invoking network
> >> drivers
> >> >
> >> >
> >> > called CNI plugins.
> >> >
> >

Re: [VOTE] Release Apache Mesos 1.0.0 (rc2)

2016-07-11 Thread Zhitao Li
ated to use the term 'agent' now.
> Flags,
> >
> >
> > binaries and scripts with 'slave' keyword have been deprecated (see
> >
> >
> > "Deprecations section below").
> >
> >
> >
> >
> >
> >   * [MESOS-4312] - **Experimental** support for building and running
> mesos
> > on
> >
> > IBM PowerPC platform.
> >
> >
> >
> >
> >
> >   * [MESOS-4189] - Weights for resource roles can now be configured
> > dynamically
> >
> > via the new '/weights' endpoint on the master.
> >
> >
> >
> >
> >
> >   * [MESOS-4424] - Support for using Nvidia GPUs as a resource in the
> >
> >
> > Mesos "unified" containerizer. This support includes running
> containers
> >
> >
> > with and without filesystem isolation (i.e. running both imageless
> >
> >
> > containers as well as containers using a docker image). Frameworks
> must
> >
> >
> > opt-in to receiving GPU resources via the GPU_RESOURCES framework
> >
> >
> > capability (see the scarce resource problem in MESOS-5377). We
> support
> >
> >
> > 'nvidia-docker'-style docker containers by injecting a volume that
> >
> >
> > contains the Nvidia libraries / binaries when the docker image has
> >
> >
> > the 'com.nvidia.volumes.needed' label. Support for the docker
> >
> >
> > containerizer will come in a future release.
> >
> >
> >
> >
> >
> >   * [MESOS-5724] - SSL certificate validation allows for additional IP
> > address
> >
> > subject alternative name extension verification.
> >
> >
> >
> >
> >
> > The CHANGELOG for the release is available at:
> >
> >
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.0.0-rc2
> >
> >
> 
> >
> >
> > The candidate for Mesos 1.0.0 release is available at:
> >
> >
> https://dist.apache.org/repos/dist/dev/mesos/1.0.0-rc2/mesos-1.0.0.tar.gz
> >
> >
> > The tag to be voted on is 1.0.0-rc2:
> >
> > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.0.0-rc2
> >
> >
> > The MD5 checksum of the tarball can be found at:
> >
> >
> https://dist.apache.org/repos/dist/dev/mesos/1.0.0-rc2/mesos-1.0.0.tar.gz.md5
> >
> >
> > The signature of the tarball can be found at:
> >
> >
> https://dist.apache.org/repos/dist/dev/mesos/1.0.0-rc2/mesos-1.0.0.tar.gz.asc
> >
> >
> > The PGP key used to sign the release is here:
> >
> > https://dist.apache.org/repos/dist/release/mesos/KEYS
> >
> >
> > The JAR is up in Maven in a staging repository here:
> >
> > https://repository.apache.org/content/repositories/orgapachemesos-1149
> >
> >
> > Please vote on releasing this package as Apache Mesos 1.0.0!
> >
> >
> > The vote is open until Tue Jul 12 15:00:00 PDT 2016 and passes if a
> > majority of at least 3 +1 PMC votes are cast.
> >
> >
> > [ ] +1 Release this package as Apache Mesos 1.0.0
> >
> > [ ] -1 Do not release this package because ...
> >
> >
> > Thanks,
> >
> > Vinod
>



-- 
Cheers,

Zhitao Li


Re: Order of URIs in CommandInfo protobuf

2016-06-20 Thread Zhitao Li
Hi Robert,

I also think parallelization of fetching is important for many use cases to
reduce the time it takes to launch a task. Can we make sure the it's still
possible to parallel downloads if you file a feature request?

Also, when a task is launched, all URIs should be already fetched into
sandbox, so I'm very interested how out-of-order could break your use case.



On Mon, Jun 20, 2016 at 12:36 PM, Jie Yu  wrote:

> Robert, I just checked the code and the ordering is not guaranteed since
> we parallelize the download currently.
>
> This sounds like a feature request. Robert, do you want to create a
> ticket? For now, i think a startup script should be able to workaround that.
>
> On Mon, Jun 20, 2016 at 11:02 AM, Robert Lacroix 
> wrote:
>
>> Jie, would it hurt if we would guarantee ordering of URIs? I could see
>> use cases where the order in which files are extracted matters. Protobuf
>> preserves ordering of repeated fields, so it shouldn't be a huge effort (it
>> probably already works).
>>
>>  Robert
>>
>> On Jun 17, 2016, at 7:37 PM, Jie Yu  wrote:
>>
>> There is no ordering assumption in the API.
>>
>> - Jie
>>
>> On Fri, Jun 17, 2016 at 10:33 AM, Wil Yegelwel 
>> wrote:
>>
>>> I'm curious whether there is an ordering assumption on the CommandInfo
>>> protobuf or if the order does not matter. The comment in mesos.proto, "Any
>>> URIs specified are fetched before executing the command" seems to imply
>>> that ordering does not matter. I just wanted to confirm that was the case.
>>>
>>> Thanks,
>>> Wil
>>>
>>
>>
>>
>


-- 
Cheers,

Zhitao Li


Re: Executors no longer inherit environment variables from the agent

2016-06-20 Thread Zhitao Li
Hi Jie,

Can you confirm that your previous response of `Any environment variables
generated by Mesos (i.e., MESOS_, LIBPROCESS_) will not be affected.` will
still be honored, or explicitly call this out in UPGRADES.md?

Thanks.

On Mon, Jun 20, 2016 at 11:39 AM, Jie Yu  wrote:

> FYI, from Mesos 1.0, the executors will no longer inherit environment
> variables from the agent by default. If you have environment environment
> variables that you want to pass in to executors, please use `--
> executor_environment_variables` flag on the agent.
>
> commit ce4b3056164a804bea52810173dbd7a418d12641
> Author: Gilbert Song 
> Date:   Sun Jun 19 16:01:10 2016 -0700
>
> Forbid the executor to inherit from slave environment.
>
> Review: https://reviews.apache.org/r/44498/
>
> - Jie
>
> On Tue, Mar 8, 2016 at 11:33 AM, Gilbert Song 
> wrote:
>
> > Hi,
> >
> > TL;DR Executors will no longer inherit environment variables from the
> agent
> > by default in 0.30.
> >
> > Currently, executors are inheriting environment variables form the agent
> in
> > mesos containerizer by default. This is an unfortunate legacy behavior
> and
> > is insecure. If you do have environment variables that you want to pass
> to
> > the executors, you can set it explicitly by using the
> > `--executor_environment_variables` agent flag.
> >
> > Starting from 0.30, we will no longer allow executors to inherit
> > environment variables from the agent. In other words,
> > `--executor_environment_variables` will be set to “{}” by default. If you
> > do depend on the original behavior, please set
> > `--executor_environment_variables` flag explicitly.
> >
> > Let us know if you have any comments or concerns.
> >
> > Thanks,
> > Gilbert
> >
>



-- 
Cheers,

Zhitao Li


Re: New external dependency

2016-06-19 Thread Zhitao Li
Hi Kevin,

Thanks for letting us know. It seems like this is not called out in
upgrades.md, so can you please document this additional dependency there?

Also, can you include the link to the JIRA or patch requiring this
dependency so we can have some contexts?

Thanks!

On Sat, Jun 18, 2016 at 10:25 AM, Kevin Klues  wrote:

> Hello all,
>
> Just an FYI that the newest libmesos now has an external dependence on
> libelf on Linux. This dependence can be installed via the following
> packages:
>
> CentOS 6/7: yum install elfutils-libelf.x86_64
> Ubuntu14.04:   apt-get install libelf1
>
> Alternatively you can install from source:
> https://directory.fsf.org/wiki/Libelf
>
> For developers, you will also need to install the libelf headers in
> order to build master. This dependency can be installed via:
>
> CentOS: elfutils-libelf-devel.x86_64
> Ubuntu: libelf-dev
>
> Alternatively, you can install from source:
> https://directory.fsf.org/wiki/Libelf
>
> The getting started guide and the support/docker_build.sh scripts have
> been updated appropriately, but you may need to update your local
> environment if you don't yet have these packages installed.
>
> --
> ~Kevin
>



-- 
Cheers,

Zhitao Li


Re: Persistent Volume API Change

2016-05-24 Thread Zhitao Li
I'd vote for fixing the bug directly w/o a deprecation period, because a
framework is always supposed to registered with proper principal to perform
various operations.

As long as we clearly document this in upgrade.md, operators should be able
to properly fix their framework usage and ACL, before they upgrade to a
version including this change, and behavior observed should not change.

On Mon, May 23, 2016 at 10:23 PM, Greg Mann  wrote:

> Hello all,
> I'm currently working on MESOS-5005
> <https://issues.apache.org/jira/browse/MESOS-5005>, which is fixing a
> small bug in the persistent volumes API. When a new persistent volume is
> created, a `DiskInfo` message is included in the disk resources of the
> volume. Nested within another message in `DiskInfo`, there is a
> `principal`
> <https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L713>
> field which is meant to contain the principal of the framework or operator
> responsible for creating the volume. Correct authorization of Destroy
> operations depends on the value of this `principal` field, so the
> correctness of its value should be enforced.
>
> Up until now, we have not been performing a validation check to ensure
> that the principal contained in `DiskInfo` is equal to the framework or
> operator's principal. I've prepared patches
> <https://reviews.apache.org/r/47515/> which enforce this constraint, and
> I wanted to check here on the mailing lists to see if the community thinks
> we need a deprecation period for these changes. Merging these changes would
> prevent frameworks from creating persistent volumes if they do not
> correctly set the `principal` field, which they have previously been
> permitted to omit. So, it has the potential to break frameworks. However,
> these patches are also necessary to ensure the correctness of Destroy
> operation authorization, and the lack of a check on the `principal` field
> is a bug that should be fixed. It would be great to hear from people who
> are running and/or writing frameworks that make use of persistent volumes,
> to see if those frameworks are setting this field properly.
>
> Thoughts?
>
> Cheers,
> Greg
>



-- 
Cheers,

Zhitao Li


Re: State of DiscoveryInfo

2016-03-28 Thread Zhitao Li
Hi Sargun,

Thanks for the response. In fact I'm trying to add DiscoveryInfo to Aurora
and the minimal deliverable I wanted is make an Aurora job routable by
Mesos DNS.

A couple of questions/replies inlined:


On Mon, Mar 28, 2016 at 10:10 AM, Sargun Dhillon  wrote:

> So, we parse DiscoveryInfo in Mesos DNS, and we can use it to generate
> custom DNS records on behalf of a framework. Mesos-DNS publishes SRV,
> and A records in order to act as an service discovery mechanism for
> applications that are both "inside" and "outside" of a "Mesos
> cluster".
>

This doc <http://mesosphere.github.io/mesos-dns/docs/naming.html> seems to
suggest it's using a "task". Do you know the exact source of that? Mesos
task id/source/"name" field on DiscoveryInfo?

>
> The DiscoveryInfo field is also used to indicate other things. For
> example, if several ports are allocated to a task / Container, we may
> be unsure of which ports that container should expect ingress
> connectivity on. Although we can parse the resources, it becomes
> somewhat brittle as we begin to interface with other networks. In
> order to avoid this, the framework can tell us which ports are
> allocated to ingress traffic. In my opinion, any tasks that are
> planning on accepting traffic should expose this information via
> DiscoveryInfo. This information can then be taken to configure tools
> like IPTables, or other network filtering.
>
> Lastly, DiscoveryInfo has some free form fields (labels). These labels
> can be used to configure load balancing tools like HAProxy, etc. We
> have some internal standards we use that can auto-configure virtual
> IPs to map to the tasks that have the right port label format. This
> makes it incredibly easy to wire up tasks within the cluster.
>

Do you have such conventions documented? I can cross link that in Aurora's
doc.


>
> On Tue, Mar 22, 2016 at 1:42 AM, haosdent  wrote:
> > As I know, Mesos-DNS use discoveryInfo from Mesos.
> > https://github.com/mesosphere/mesos-dns
> >
> > I also found some links may be useful for you:
> >
> > https://open.mesosphere.com/tutorials/service-discovery/
> >
> https://mesosphere.github.io/marathon/docs/service-discovery-load-balancing.html
> >
> http://events.linuxfoundation.org/sites/events/files/slides/mesos-networking.mesoscon2015.pdf
> >
> > And the design doc of DiscoveryInfo also show "Example Uses" about it:
> >
> https://docs.google.com/document/d/1tpnjfHsa5Joka23CqgGppqnK0jODcElBvTFUBBO-A38/edit#heading=h.4wpt5efsi44n
> >
> > On Tue, Mar 22, 2016 at 4:31 PM, tommy xiao  wrote:
> >>
> >> need a related issue to tracking
> >>
> >> 2016-03-22 13:24 GMT+08:00 Zhitao Li :
> >>>
> >>> Hi,
> >>>
> >>> Does anyone have an example of using the DiscoveryInfo from Mesos 0.22?
> >>> I'm interested in understanding its current status and adoption
> situation,
> >>> whether any real service discovery system is using it, and what's the
> >>> blocker if not.
> >>>
> >>> Thanks.
> >>>
> >>> --
> >>> Cheers,
> >>>
> >>> Zhitao Li
> >>
> >>
> >>
> >>
> >> --
> >> Deshi Xiao
> >> Twitter: xds2000
> >> E-mail: xiaods(AT)gmail.com
> >
> >
> >
> >
> > --
> > Best Regards,
> > Haosdent Huang
>



-- 
Cheers,

Zhitao Li


Re: How to make full version available in /version endpoint

2016-03-23 Thread Zhitao Li
Erik, that would still be a problem if an organization is building Mesos
between release versions.

Jeff/Vinod, it seems like the 0.26.0 part comes from this line:

AC_INIT([mesos], [0.29.0])

and it's possible to patch that line to allow a custom version (read from a
file or variable using m4_esyscmd_s).



On Wed, Mar 23, 2016 at 5:24 PM, Erik Weathers 
wrote:

> The extra "-2.0.16" portion of that version number is an artifact from
> Mesosphere's build system, and my understanding is they are going to get
> rid of it.  So perhaps this will not be a problem in the future?
>
> - Erik
>
> On Wed, Mar 23, 2016 at 5:10 PM, Jeff Schroeder <
> jeffschroe...@computer.org> wrote:
>
>> Perhaps building your own version, with your own version string would be
>> sufficient? A general purpose feature to override the stated version with
>> an environment variable doesn't seem very applicable in many environments.
>> Perhaps there is a different way you could accomplish the same ultimate
>> goal?
>>
>>
>> On Wednesday, March 23, 2016, Zhitao Li  wrote:
>>
>>> We want to have an external system to monitor or manage the full Mesos
>>> cluster, and neither the current "version" nor git_sha seems sufficient to
>>> determine whether a build being run is what we needed, especially when we
>>> move to our own packages.
>>>
>>> Being able to override the "version" key with an environment variable is
>>> probably sufficient for us.
>>>
>>> On Wed, Mar 23, 2016 at 4:51 PM, Vinod Kone 
>>> wrote:
>>>
>>>> Not currently, no. What's your use case?
>>>>
>>>> On Wed, Mar 23, 2016 at 3:50 PM, Zhitao Li 
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Has anyone brought up the possibility of making the full version
>>>>> (i.e. 0.28.0-2.0.16.debian81a) show up in the the /version endpoint?
>>>>>
>>>>> For example, when we are using the mesosphere community package, we
>>>>> want '0.27.1-2.0.226.debian81' string show up, but we only get the
>>>>> following right now:
>>>>>
>>>>> {
>>>>>   "build_date": "2016-02-23 00:39:17",
>>>>>   "build_time": 1456187957,
>>>>>   "build_user": "root",
>>>>>   "git_sha": "864fe8eabd4a83b78ce9140c501908ee3cb90beb",
>>>>>   "git_tag": "0.27.1",
>>>>>   "version": "0.27.1"
>>>>> }
>>>>>
>>>>> Is there an environment variable or something which we could tweak at
>>>>> build/package time to get it? Thanks!
>>>>>
>>>>> --
>>>>> Cheers,
>>>>>
>>>>> Zhitao Li
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Cheers,
>>>
>>> Zhitao Li
>>>
>>
>>
>> --
>> Text by Jeff, typos by iPhone
>>
>
>


-- 
Cheers,

Zhitao Li


Re: How to make full version available in /version endpoint

2016-03-23 Thread Zhitao Li
We want to have an external system to monitor or manage the full Mesos
cluster, and neither the current "version" nor git_sha seems sufficient to
determine whether a build being run is what we needed, especially when we
move to our own packages.

Being able to override the "version" key with an environment variable is
probably sufficient for us.

On Wed, Mar 23, 2016 at 4:51 PM, Vinod Kone  wrote:

> Not currently, no. What's your use case?
>
> On Wed, Mar 23, 2016 at 3:50 PM, Zhitao Li  wrote:
>
>> Hi,
>>
>> Has anyone brought up the possibility of making the full version
>> (i.e. 0.28.0-2.0.16.debian81a) show up in the the /version endpoint?
>>
>> For example, when we are using the mesosphere community package, we want 
>> '0.27.1-2.0.226.debian81'
>> string show up, but we only get the following right now:
>>
>> {
>>   "build_date": "2016-02-23 00:39:17",
>>   "build_time": 1456187957,
>>   "build_user": "root",
>>   "git_sha": "864fe8eabd4a83b78ce9140c501908ee3cb90beb",
>>   "git_tag": "0.27.1",
>>   "version": "0.27.1"
>> }
>>
>> Is there an environment variable or something which we could tweak at
>> build/package time to get it? Thanks!
>>
>> --
>> Cheers,
>>
>> Zhitao Li
>>
>
>


-- 
Cheers,

Zhitao Li


How to make full version available in /version endpoint

2016-03-23 Thread Zhitao Li
Hi,

Has anyone brought up the possibility of making the full version
(i.e. 0.28.0-2.0.16.debian81a) show up in the the /version endpoint?

For example, when we are using the mesosphere community package, we
want '0.27.1-2.0.226.debian81'
string show up, but we only get the following right now:

{
  "build_date": "2016-02-23 00:39:17",
  "build_time": 1456187957,
  "build_user": "root",
  "git_sha": "864fe8eabd4a83b78ce9140c501908ee3cb90beb",
  "git_tag": "0.27.1",
  "version": "0.27.1"
}

Is there an environment variable or something which we could tweak at
build/package time to get it? Thanks!

-- 
Cheers,

Zhitao Li


State of DiscoveryInfo

2016-03-21 Thread Zhitao Li
Hi,

Does anyone have an example of using the DiscoveryInfo from Mesos 0.22? I'm
interested in understanding its current status and adoption situation,
whether any real service discovery system is using it, and what's the
blocker if not.

Thanks.

-- 
Cheers,

Zhitao Li


Re: Current state of the oversubscription feature

2016-03-21 Thread Zhitao Li
Hi Stephan,

Glad someone is sharing interest in this topic. My company is also very
interested in this topic. Sharing a couple of thoughts:

1. I believe there real difficulties here come from isolation: how Mesos
would handle over committed memory because it cannot throttle like CPU?
2. Handling this within one single Mesos framework could differ from the
case of running multiple frameworks;
3. I know you are active on Apache Aurora. I believe right now Aurora does
not consider ram as revocable resources, but we probably work together to
expand that once we know the isolation story.


On Mon, Mar 21, 2016 at 8:30 AM, Erb, Stephan 
wrote:

> Judging from the epic description, this seems to target the
> oversubscription of reserved resources ​on the framework level.
>
>
> However, my question was targeting the task level, where one task of a
> framework is requesting more RAM than it actually uses, and another tasks
> from the same framework can be started as revocable and use those slack
> resources.
>
>
> The latter is already possible with compressible resources such as CPU or
> bandwidth. I am now interested in non-compressible resources (i.e. memory).
>
>
> --
> *From:* Guangya Liu 
> *Sent:* Monday, March 21, 2016 15:53
> *To:* user@mesos.apache.org
> *Subject:* Re: Current state of the oversubscription feature
>
> https://issues.apache.org/jira/browse/MESOS-4967 is planning to introduce
> "Oversubscription for reservation", can you please help check if this
> help?
>
> Thanks,
>
> Guangya
>
> On Mon, Mar 21, 2016 at 8:54 PM, Erb, Stephan  > wrote:
>
>> Hi everyone,
>>
>> I am interested in the current state of the Mesos oversubscription
>> feature [1]. In particular, I would like to know if anyone has taken a
>> closer look at non-compressible resources such as memory.
>>
>> Anything I should be aware of?
>>
>> Thanks and Best Regards,
>> Stephan
>>
>> [1] http://mesos.apache.org/documentation/latest/oversubscription/
>
>
>


-- 
Cheers,

Zhitao Li


Re: [VOTE] Release Apache Mesos 0.28.0 (rc2)

2016-03-19 Thread Zhitao Li
I don't think it's a blocking issue after some initial investigation.

Changing my vote to +1 (nonbinding)

On Wed, Mar 16, 2016 at 6:07 PM, Vinod Kone  wrote:

>
> On Wed, Mar 16, 2016 at 5:59 PM, Daniel Osborne <
> daniel.osbo...@metaswitch.com> wrote:
>
>> Is this issue a blocker? Are we moving to rc3 or proceeding with 0.28.0?
>>
>
> It was not marked as such, so I'm guessing not. @Jie and @Zhitao, can you
> confirm?
>
> Also, we still need some binding votes for this release to go official.
> @committers: can you please vote?
>



-- 
Cheers,

Zhitao Li


Re: [VOTE] Release Apache Mesos 0.28.0 (rc2)

2016-03-15 Thread Zhitao Li
Marked duplicate. Thanks!

On Tue, Mar 15, 2016 at 5:56 AM, Jörg Schad  wrote:

> I believe
> the ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand
> issue is already tracked here:
> https://issues.apache.org/jira/browse/MESOS-4810
> @zhitaio could you check whether this describes your issue (if so could
> you close the new issue as duplicate?). Thanks!
>
> On Tue, Mar 15, 2016 at 6:55 AM, Zhitao Li  wrote:
>
>> Filed https://issues.apache.org/jira/browse/MESOS-4946 to track.
>>
>> All "OsTest" passes under root on my machine.
>>
>> On Mon, Mar 14, 2016 at 6:30 PM, haosdent  wrote:
>>
>>> Maybe fill a ticket in https://issues.apache.org/jira/browse/MESOS
>>> would be more convenience for further discussion. By the way, could
>>> "OsTest.User" pass in your machine? It also call "os::getgid" during test.
>>>
>>> On Tue, Mar 15, 2016 at 6:57 AM, Zhitao Li 
>>> wrote:
>>>
>>>> When running `sudo make check` on debian 8, I saw the following
>>>> unaccounted test failure:
>>>>
>>>> [ FAILED ]
>>>> ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand (1129
>>>> ms)
>>>>
>>>> It seems to related to an error message with `Failed to change user to
>>>> 'root': Failed to getgid: unknown user`
>>>>
>>>> I've included verbose test log output at
>>>> https://gist.github.com/zhitaoli/95436f4ea2df13c4b137.
>>>>
>>>> On Mon, Mar 14, 2016 at 2:59 PM, Daniel Osborne <
>>>> daniel.osbo...@metaswitch.com> wrote:
>>>>
>>>>> +1 (non-binding)
>>>>>
>>>>> Ran `sudo make check` on Centos 7. All tests passed.
>>>>>
>>>>> Also ran some runtime tests with unified containerizer launching
>>>>> docker images and regular mesos tasks, as well as some tasks using the
>>>>> docker containerizer. All working as expected
>>>>>
>>>>> Cheers,
>>>>> -Dan
>>>>>
>>>>> -Original Message-
>>>>> From: Vinod Kone [mailto:vinodk...@apache.org]
>>>>> Sent: Friday, March 11, 2016 12:46 PM
>>>>> To: dev ; user 
>>>>> Subject: [VOTE] Release Apache Mesos 0.28.0 (rc2)
>>>>>
>>>>> Hi all,
>>>>>
>>>>>
>>>>> Please vote on releasing the following candidate as Apache Mesos
>>>>> 0.28.0.
>>>>>
>>>>>
>>>>> 0.28.0 includes the following:
>>>>>
>>>>>
>>>>> 
>>>>>
>>>>> Release Notes - Mesos - Version 0.28.0
>>>>>
>>>>> 
>>>>>
>>>>> This release contains the following new features:
>>>>>
>>>>>   * [MESOS-4343] - A new cgroups isolator for enabling the net_cls
>>>>> subsystem in
>>>>>
>>>>> Linux. The cgroups/net_cls isolator allows operators to provide
>>>>> network
>>>>>
>>>>> performance isolation and network segmentation for containers
>>>>> within a Mesos
>>>>>
>>>>> cluster. To enable the cgroups/net_cls isolator, append
>>>>> `cgroups/net_cls` to
>>>>>
>>>>> the `--isolation` flag when starting the slave. Please refer to
>>>>>
>>>>> docs/mesos-containerizer.md for more details.
>>>>>
>>>>>
>>>>>   * [MESOS-4687] - The implementation of scalar resource values (e.g.,
>>>>> "2.5
>>>>>
>>>>> CPUs") has changed. Mesos now reliably supports resources with up
>>>>> to three
>>>>>
>>>>> decimal digits of precision (e.g., "2.501 CPUs"); resources with
>>>>> more than
>>>>>
>>>>> three decimal digits of precision will be rounded. Internally,
>>>>> resource math
>>>>>
>>>>> is now done using a fixed-point format that supports three decimal
>>>>> digits of
>>>>>
>>>>> precision, and then converted to/from floating point for input and
>>>>> output,
>>>>>
&

Re: [VOTE] Release Apache Mesos 0.28.0 (rc2)

2016-03-14 Thread Zhitao Li
Filed https://issues.apache.org/jira/browse/MESOS-4946 to track.

All "OsTest" passes under root on my machine.

On Mon, Mar 14, 2016 at 6:30 PM, haosdent  wrote:

> Maybe fill a ticket in https://issues.apache.org/jira/browse/MESOS would
> be more convenience for further discussion. By the way, could "OsTest.User"
> pass in your machine? It also call "os::getgid" during test.
>
> On Tue, Mar 15, 2016 at 6:57 AM, Zhitao Li  wrote:
>
>> When running `sudo make check` on debian 8, I saw the following
>> unaccounted test failure:
>>
>> [ FAILED ]
>> ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand (1129
>> ms)
>>
>> It seems to related to an error message with `Failed to change user to
>> 'root': Failed to getgid: unknown user`
>>
>> I've included verbose test log output at
>> https://gist.github.com/zhitaoli/95436f4ea2df13c4b137.
>>
>> On Mon, Mar 14, 2016 at 2:59 PM, Daniel Osborne <
>> daniel.osbo...@metaswitch.com> wrote:
>>
>>> +1 (non-binding)
>>>
>>> Ran `sudo make check` on Centos 7. All tests passed.
>>>
>>> Also ran some runtime tests with unified containerizer launching docker
>>> images and regular mesos tasks, as well as some tasks using the docker
>>> containerizer. All working as expected
>>>
>>> Cheers,
>>> -Dan
>>>
>>> -Original Message-
>>> From: Vinod Kone [mailto:vinodk...@apache.org]
>>> Sent: Friday, March 11, 2016 12:46 PM
>>> To: dev ; user 
>>> Subject: [VOTE] Release Apache Mesos 0.28.0 (rc2)
>>>
>>> Hi all,
>>>
>>>
>>> Please vote on releasing the following candidate as Apache Mesos 0.28.0.
>>>
>>>
>>> 0.28.0 includes the following:
>>>
>>>
>>> 
>>>
>>> Release Notes - Mesos - Version 0.28.0
>>>
>>> 
>>>
>>> This release contains the following new features:
>>>
>>>   * [MESOS-4343] - A new cgroups isolator for enabling the net_cls
>>> subsystem in
>>>
>>> Linux. The cgroups/net_cls isolator allows operators to provide
>>> network
>>>
>>> performance isolation and network segmentation for containers within
>>> a Mesos
>>>
>>> cluster. To enable the cgroups/net_cls isolator, append
>>> `cgroups/net_cls` to
>>>
>>> the `--isolation` flag when starting the slave. Please refer to
>>>
>>> docs/mesos-containerizer.md for more details.
>>>
>>>
>>>   * [MESOS-4687] - The implementation of scalar resource values (e.g.,
>>> "2.5
>>>
>>> CPUs") has changed. Mesos now reliably supports resources with up to
>>> three
>>>
>>> decimal digits of precision (e.g., "2.501 CPUs"); resources with
>>> more than
>>>
>>> three decimal digits of precision will be rounded. Internally,
>>> resource math
>>>
>>> is now done using a fixed-point format that supports three decimal
>>> digits of
>>>
>>> precision, and then converted to/from floating point for input and
>>> output,
>>>
>>> respectively. Frameworks that do their own resource math and
>>> manipulate
>>>
>>> fractional resources may observe differences in roundoff error and
>>> numerical
>>>
>>> precision.
>>>
>>>
>>>   * [MESOS-4479] - Reserved resources can now optionally include
>>> "labels".
>>>
>>> Labels are a set of key-value pairs that can be used to associate
>>> metadata
>>>
>>> with a reserved resource. For example, frameworks can use this
>>> feature to
>>>
>>> distinguish between two reservations for the same role at the same
>>> agent
>>>
>>> that are intended for different purposes.
>>>
>>>
>>>   * [MESOS-2840] - **Experimental** support for container images in Mesos
>>>
>>> containerizer (a.k.a. Unified Containerizer). This allows frameworks
>>> to
>>>
>>> launch Docker/Appc containers using Mesos containerizer without
>>> relying on
>>>
>>> docker daemon (engine) or rkt. The isolation of the containers is
>

Re: [VOTE] Release Apache Mesos 0.28.0 (rc2)

2016-03-14 Thread Zhitao Li
t;
>
> The candidate for Mesos 0.28.0 release is available at:
>
> https://dist.apache.org/repos/dist/dev/mesos/0.28.0-rc2/mesos-0.28.0.tar.gz
>
>
> The tag to be voted on is 0.28.0-rc2:
>
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.28.0-rc2
>
>
> The MD5 checksum of the tarball can be found at:
>
>
> https://dist.apache.org/repos/dist/dev/mesos/0.28.0-rc2/mesos-0.28.0.tar.gz.md5
>
>
> The signature of the tarball can be found at:
>
>
> https://dist.apache.org/repos/dist/dev/mesos/0.28.0-rc2/mesos-0.28.0.tar.gz.asc
>
>
> The PGP key used to sign the release is here:
>
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
>
> The JAR is up in Maven in a staging repository here:
>
> https://repository.apache.org/content/repositories/orgapachemesos-1120
>
>
> Please vote on releasing this package as Apache Mesos 0.28.0!
>
>
> The vote is open until Wed Mar 16 15:43:35 EDT 2016 and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
>
> [ ] +1 Release this package as Apache Mesos 0.28.0
>
> [ ] -1 Do not release this package because ...
>
>
> Thanks,
>



-- 
Cheers,

Zhitao Li


Re: Executors no longer inherit environment variables from the agent

2016-03-08 Thread Zhitao Li
Is LIBPROCESS_IP going to be an exception to this? Some executors are using
this variable as an alternative of implementing their own IP detection
logic AFAIK so this behavior would break them.

On Tue, Mar 8, 2016 at 11:33 AM, Gilbert Song  wrote:

> Hi,
>
> TL;DR Executors will no longer inherit environment variables from the
> agent by default in 0.30.
>
> Currently, executors are inheriting environment variables form the agent
> in mesos containerizer by default. This is an unfortunate legacy behavior
> and is insecure. If you do have environment variables that you want to pass
> to the executors, you can set it explicitly by using the
> `--executor_environment_variables` agent flag.
>
> Starting from 0.30, we will no longer allow executors to inherit
> environment variables from the agent. In other words,
> `--executor_environment_variables` will be set to “{}” by default. If you
> do depend on the original behavior, please set
> `--executor_environment_variables` flag explicitly.
>
> Let us know if you have any comments or concerns.
>
> Thanks,
> Gilbert
>



-- 
Cheers,

Zhitao Li


Re: URL for viewing persistent volumes

2016-02-29 Thread Zhitao Li
Awesome! Thanks for the info.

Sent from my iPhone

> On Feb 29, 2016, at 10:45 AM, Neil Conway  wrote:
> 
> Hi Zhitao,
> 
> We just implemented this feature; it will appear in Mesos 0.28. You
> will be able to list all the information about the persistent volumes
> and reservations at every slave in the cluster by examining the
> master's "/slaves" endpoint. For more information, see:
> 
> https://issues.apache.org/jira/browse/MESOS-4667
> https://reviews.apache.org/r/44047/
> 
> Neil
> 
> 
>> On Mon, Feb 29, 2016 at 9:35 AM, Zhitao Li  wrote:
>> Hi,
>> 
>> Is there a HTTP url to list and view existing persistent volume created so
>> far? I'm running 0.27.1 and couldn't find how to obtain such info.
>> 
>> Thanks!
>> 
>> --
>> Cheers,
>> 
>> Zhitao Li


URL for viewing persistent volumes

2016-02-29 Thread Zhitao Li
Hi,

Is there a HTTP url to list and view existing persistent volume created so
far? I'm running 0.27.1 and couldn't find how to obtain such info.

Thanks!

-- 
Cheers,

Zhitao Li


Re: Downloading s3 uris

2016-02-26 Thread Zhitao Li
Haven't directly used s3 download, but I think a workaround (if you don't care 
ACL about the files) is to use http 

 url instead.
> On Feb 26, 2016, at 8:17 AM, Aaron Carey  wrote:
> 
> I'm attempting to fetch files from s3 uris in mesos, but we're not using hdfs 
> in our cluster... however I believe I need the client installed.
> 
> Is it possible to just have the client running without a full hdfs setup?
> 
> I haven't been able to find much information in the docs, could someone point 
> me in the right direction?
> 
> Thanks!
> 
> Aaron



Re: Safe update of agent attributes

2016-02-23 Thread Zhitao Li
Hi Adam,

The command `mesos-slave --recover=cleanup` could indeed to be used for
clean up an incompatible change.

I am still concerned about the possibility that a totally valid attributes
or resources value change could leave the Mesos agent to be in crash loop
and losing critical tasks after --recovery_timeout, when the update
sequence is incorrect.

Can we consider to add a new option like "--auto_recovery_cleanup" which
would automatically perform the clean up if detected incompatible slave
info, or change the default behavior for "--recover"?

Thanks.

On Mon, Feb 22, 2016 at 3:41 PM, Adam Bordelon  wrote:

> Currently, changing any --attributes or --resources requires draining the
> agent and killing all running tasks.
> See https://issues.apache.org/jira/browse/MESOS-1739
> You could do a `mesos-slave --recovery=cleanup` which essentially kills
> all the tasks and clears the work_dir; then restart with a `mesos-slave
> --attributes=new_attributes`
> Note that even adding a new attribute is the kind of change that could
> cause a framework scheduler to no longer want its task on that node. For
> example, you add "public_ip=true" and now my scheduler no longer wants to
> run private tasks there. As such, any attribute change needs to notify all
> schedulers of the change.
>
>
> On Mon, Feb 22, 2016 at 2:01 PM, Marco Massenzio 
> wrote:
>
>> IIRC you can avoid the issue by either using a different work_dir for the
>> agent, or removing (and, possibly, re-creating) it.
>>
>> I'm afraid I don't have a running instance of Mesos on this machine and
>> can't test it out.
>>
>> Also (and this is strictly my opinion :) I would consider a change of
>> attribute a "material" change for the Agent and I would avoid trying to
>> recover state from previous runs; but, again, there may be perfectly
>> legitimate cases in which this is desirable.
>>
>> --
>> *Marco Massenzio*
>> http://codetrips.com
>>
>> On Mon, Feb 22, 2016 at 12:11 PM, Zhitao Li 
>> wrote:
>>
>>> Hi,
>>>
>>> We recently discovered that updating attributes on Mesos agents is a
>>> very risk operation, and has a potential to send agent(s) into a crash loop
>>> if not done properly with errors like "Failed to perform recovery:
>>> Incompatible slave info detected". This combined with
>>> --recovery_timeout made the situation even worse.
>>>
>>> In our setup, some of the attributes are generated from automated
>>> configuration management system, so this opens a possibility that "bad"
>>> configuration could be left on the machine and causing big trouble on next
>>> agent upgrade, if the USR1 signal was not sent on time.
>>>
>>> Some questions:
>>>
>>> 1. Does anyone have a good practice recommended on managing these
>>> attributes safely?
>>> 2. Has Mesos considered to fallback to old metadata if it detects
>>> incompatibility, so agents would keep running with old attributes instead
>>> of falling into crash loop?
>>>
>>> Thanks.
>>>
>>> --
>>> Cheers,
>>>
>>> Zhitao Li
>>>
>>
>>
>


-- 
Cheers,

Zhitao


Safe update of agent attributes

2016-02-22 Thread Zhitao Li
Hi,

We recently discovered that updating attributes on Mesos agents is a very
risk operation, and has a potential to send agent(s) into a crash loop if
not done properly with errors like "Failed to perform recovery:
Incompatible slave
info detected". This combined with --recovery_timeout made the situation
even worse.

In our setup, some of the attributes are generated from automated
configuration management system, so this opens a possibility that "bad"
configuration could be left on the machine and causing big trouble on next
agent upgrade, if the USR1 signal was not sent on time.

Some questions:

1. Does anyone have a good practice recommended on managing these
attributes safely?
2. Has Mesos considered to fallback to old metadata if it detects
incompatibility, so agents would keep running with old attributes instead
of falling into crash loop?

Thanks.

-- 
Cheers,

Zhitao Li


Re: [VOTE] Release Apache Mesos 0.27.1 (rc1)

2016-02-17 Thread Zhitao Li
+1 (non binding)

Debian 8 (jessie) plain non root OK.
Mac OS X plain non root OK.
Ubuntu 14.04 plain/SSL with root and docker-engine 1.10: Only flaky test I 
observed on Ubuntu 14.04 is HealthCheckTest.HealthStatusChange, which is 
tracked in MESOS-1802 already.

> On Feb 17, 2016, at 5:21 AM, Bernd Mathiske  wrote:
> 
> +1 (binding)
> 
> Test failures look a lot like with 0.27.0. Not clean, but nothing deemed too 
> drastic yet.
> 
> CentOS 7 plain:
> FetcherCacheHttpTest.HttpCachedSerialized flaky again, filed MESOS-4692
> LinuxFilesystemIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem known as 
> flaky: MESOS-4674
> LinuxFilesystemIsolatorTest.ROOT_MultipleContainers known as flaky: MESOS-4674
> 
> CentOS 7 SSL-enabled:
> LinuxFilesystemIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem,
> LinuxFilesystemIsolatorTest.ROOT_MultipleContainers
> both known as flaky: MESOS-4674
> 
> CentOS 6 plain:  OK
> 
> CentOS 6 SSL-enabled:
> MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
> flaky as often observed before, probably MESOS-4053
> 
> Ubuntu 14.04 plain/SSL, Ubuntu 12.04 plain/SSL, Ubuntu 15 plain: OK,
> 
> Ubuntu 15 SSL-enabled:
> DockerContainerizerTest.ROOT_DOCKER_Logs known as flaky: MESOS-4676
> 
> Other known frequently flaky tests that have not been tested this time 
> (filtered out):
> HealthCheckTest.ROOT_DOCKER_DockerHealthyTask
> HealthCheckTest.ROOT_DOCKER_DockerHealthStatusChange
> HookTest.ROOT_DOCKER_VerifySlavePreLaunchDockerHook
> DockerContainerizerTest.ROOT_DOCKER_Launch_Executor
> 
> Bernd
> 
>> On Feb 17, 2016, at 1:52 AM, Michael Park  wrote:
>> 
>> Hi all,
>> 
>> Please vote on releasing the following candidate as Apache Mesos 0.27.1.
>> 
>> 
>> 0.27.1 includes the following:
>> 
>> * Improved `systemd` integration.
>> * Ability to disable `systemd` integration.
>> 
>> * Additional performance improvements to /state endpoint.
>> * Removed duplicate "active" keys from the /state endpoint.
>> 
>> The CHANGELOG for the release is available at:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.27.1-rc1
>> 
>> 
>> The candidate for Mesos 0.27.1 release is available at:
>> https://dist.apache.org/repos/dist/dev/mesos/0.27.1-rc1/mesos-0.27.1.tar.gz
>> 
>> The tag to be voted on is 0.27.1-rc1:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.27.1-rc1
>> 
>> The MD5 checksum of the tarball can be found at:
>> https://dist.apache.org/repos/dist/dev/mesos/0.27.1-rc1/mesos-0.27.1.tar.gz.md5
>> 
>> The signature of the tarball can be found at:
>> https://dist.apache.org/repos/dist/dev/mesos/0.27.1-rc1/mesos-0.27.1.tar.gz.asc
>> 
>> The PGP key used to sign the release is here:
>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>> 
>> The JAR is up in Maven in a staging repository here:
>> https://repository.apache.org/content/repositories/orgapachemesos-1102
>> 
>> Please vote on releasing this package as Apache Mesos 0.27.1!
>> 
>> The vote is open until Fri Feb 19 17:00:00 PST 2016 and passes if a majority 
>> of at least 3 +1 PMC votes are cast.
>> 
>> [ ] +1 Release this package as Apache Mesos 0.27.1
>> [ ] -1 Do not release this package because ...
>> 
>> Thanks,
>> 
>> Joris, MPark
>