Re: Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

2018-09-04 Thread Andrew Schwartzmeyer
I think your approach would be fairly sound. That is, to change the 
logic to read the IDs from the info file instead of the paths. But I 
also think we can punt this for now (as I do not think a task ID like 
'Hello%3AWorld' is plausibly in use right now), and implement a fix for 
colons now that would remain compatible.


If we add encode/decode logic for colons on Windows, we do not introduce 
backward compatibility issues on other platforms (as we'd constrain the 
change to Windows), and in the future, we can safely replace the decode 
logic with your approach. That is to say, we implement the encoding as 
sparingly as possible, but implement it now, because it's kind of 
required, and we implement the decoding only as a stop-gap until we 
replace this logic with reading from the info file instead. If we later 
find another character in use that also needs to be encoded, we can then 
abstract the single encoding into a per-platform encoding set.


Does this seem reasonable?

Thanks,

Andy

P.S. Sorry this took a while to get back to, I was out last week.

On 08/23/2018 3:34 pm, Chun-Hung Hsiao wrote:
I'm a bit concerned about the recovery logic and backward 
compatibility:

The changes we're making shouldn't affect existing users,
and we should try hard to avoid any future backward compatibility 
problem.


Say if there is already some custom framework using task ID 
'Hello%3AWorld',
then if we blindly decode the task path during recovery, we will get 
the

wrong ID 'Hello:World'.
On the other hand, if we don't decode the task path during recovery,
then later on during checkpointing for the same task,
we shouldn't blindly encode the task ID, because it might create a
different path,
and we'll need to introduce some migration code to avoid such 
duplication.


Fortunately, we do checkpoint the executor IDs and task IDs in the info
files under the meta dir.
So I'm considering the following design to minimize the backward
compatibility issue we might have:
During recovery, we don't decode the recovered task path,
but get the executor/task ID from the info file instead of relying on
parsing the executor/task path.
When checkpointing, we only encode executor/task IDs if they contain
reserved characters.
The set of reserved characters could be defined as a platform-dependent
variable,
similar to what we have done for `PATH_SEPARATOR`.

The above design would look a bit more complicated then just blindly
applying percent encoding
to when constructing checkpoint paths, but it doesn't require extra
checkpoint migration logic,
and would keep the exact same behavior we have now for "normal"
executor/task IDs.

What did you guys think? Please feel free to raise any concern :)
And we don't need to implement the whole thing for now.
For example, we could start with just dealing with colons,
and extend the implementation later on,
as long as the partial solution we're going to have right now doesn't
create future tech debts!

Best,
Chun-Hung

On Thu, Aug 23, 2018 at 1:42 PM Greg Mann  wrote:


Thanks Andy! Responses inlined below.




No: As the only character we've run into a problem with is `:`
(MESOS-9109), it might not be worth it to generalize this to solve a 
bunch

of problems that we haven't encountered.



It's true that I'm not aware of other scenarios where
filesystem-disallowed characters in task/executor IDs have caused 
issues

for users, and this issue has existed for a long time. However, when
feasible I would like to fix issues that we're aware of before they 
cause
problems for users, rather than after. I would suggest that since we 
have
one compelling case that we need to address now, it's worth 
formulating an

approach for the general case, so that we can be sure any current work
doesn't get in our way later on.


I'm somewhat comfortable doing so only for Windows, as we don't 
really
need to worry about the recovery scenario; but very uncomfortable 
about

doing so for Linux etc., for precisely that reason.

So expanding this is definitely up for debate; but we must fix the 
bug

with `:`.


Indeed, addressing the general case may prove to be much more complex 
- I
can certainly identify with this situation, where a fix for a smaller 
issue

turns into a big project :)
It may turn out to be possible to implement a scoped-down solution for 
the

colon case now, and extend it later on. I think it would be good if we
could at least get an idea of how we want to handle the general case 
now,

so that any short-term solutions can be a constructive step toward the
long-term.

Cheers,
G



Re: Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

2018-08-22 Thread Andrew Schwartzmeyer
 > I'm surprised we haven't run into issues like this before on Linux.

Indeed!

> I'm wondering if this warrants a general solution that could take care of all 
> filesystem-disallowed characters. 

I think yes and no.

Yes: it'd be easy to just `process::http::encode(executorId/taskId)`
(though it might look funny) right where I'm currently applying the same
but only to `:`.

No: As the only character we've run into a problem with is `:`
(MESOS-9109), it might not be worth it to generalize this to solve a
bunch of problems that we haven't encountered.

I'm somewhat comfortable doing so only for Windows, as we don't really
need to worry about the recovery scenario; but very uncomfortable about
doing so for Linux etc., for precisely that reason.

So expanding this is definitely up for debate; but we must fix the bug
with `:`.

Thanks for the feedback,

Andy

On 08/22/2018 5:11 pm, Greg Mann wrote: 

> Thanks for addressing this Andy!! AFAIK we allow all characters in executor 
> and task IDs; I'm surprised we haven't run into issues like this before on 
> Linux. 
> 
> The percent-encoding approach seems fine to me. As long as the percent 
> character isn't an issue on any filesystems that we're interested in? As a 
> starting point, Wikipedia seems to have a decent survey of restrictions on 
> different filesystems here [5]. Looks like the percent character may be fine. 
> 
> I wonder if there are other characters we should be concerned about? I'm 
> guessing we should worry about slashes and backslashes as well? Seems like a 
> more general solution might help us avoid similar pitfalls in the future. 
> Perhaps we could just percent-encode executor and task IDs before we write to 
> disk? If we did this, we would have issues during recovery to consider, where 
> we need to look for "old" paths when recovering state from an "old" agent. 
> 
> In any case, I'm wondering if this warrants a general solution that could 
> take care of all filesystem-disallowed characters. WDYT? 
> 
> Cheers, 
> Greg 
> 
> On Tue, Aug 21, 2018 at 2:02 PM, Andrew Schwartzmeyer 
>  wrote:
> 
>> Hey all,
>> 
>> I have a set of patches up for MESOS-9109 that I need reviewed, starting 
>> here: https://reviews.apache.org/r/68297/ [1].
>> 
>> Eduard here was trying to use Chronos to schedule a task on a Windows agent, 
>> and found an error due to the fact that Chronos uses colons (as in `:`) in 
>> its generated framework (and task) IDs. Now, to maintain backward 
>> compatibility, we obviously can't disallow the use of `:` as there are 
>> frameworks already using it. However, this is a reserved character on 
>> Windows for file system paths 
>> (https://docs.microsoft.com/en-us/windows/desktop/FileIO/naming-a-file [2]), 
>> so it cannot be in the path.
>> 
>> My first implementation simply applied `s/:/_COLON_` to `frameworkId` and 
>> `taskId` in the functions in `paths.cpp` which generate Mesos's filesystem 
>> paths. While this worked, it's kind of a kludge. Or that is to say, it would 
>> nicer to use the ASCII representation of `%3A` instead. Doing so, however, 
>> revealed a bug in libprocess (MESOS-9168) that I have also fixed and need 
>> reviewed, starting here: https://reviews.apache.org/r/68420/ [3]
>> 
>> So combining the two fixes, the chain maps `:` in `frameworkId` and `taskId` 
>> to `%3A` (and back when appropriate). This obviously doesn't fix any 
>> third-party tooling, but being Windows, I don't think there is any yet to 
>> worry about.
>> 
>> I wanted to get this in for 1.7, but due to a miscommunication, we were not 
>> able to land it in time. If you can, please review! Or if you have a better 
>> way of doing this, let me know!
>> 
>> Thanks,
>> 
>> Andy
>> 
>> P.S. Original discussion here: 
>> https://mesos.slack.com/archives/C1LPTK50T/p153332465396 [4] (our Slack 
>> archives seem to be down, so this is only available until Slack cycles out 
>> sadly).

 

Links:
--
[1] https://reviews.apache.org/r/68297/
[2]
https://docs.microsoft.com/en-us/windows/desktop/FileIO/naming-a-file
[3] https://reviews.apache.org/r/68420/
[4] https://mesos.slack.com/archives/C1LPTK50T/p153332465396
[5]
https://en.wikipedia.org/wiki/Filename#Comparison_of_filename_limitations

Follow up to discussion regarding use : in paths on Windows (MESOS-9109)

2018-08-21 Thread Andrew Schwartzmeyer

Hey all,

I have a set of patches up for MESOS-9109 that I need reviewed, starting 
here: https://reviews.apache.org/r/68297/.


Eduard here was trying to use Chronos to schedule a task on a Windows 
agent, and found an error due to the fact that Chronos uses colons (as 
in `:`) in its generated framework (and task) IDs. Now, to maintain 
backward compatibility, we obviously can't disallow the use of `:` as 
there are frameworks already using it. However, this is a reserved 
character on Windows for file system paths 
(https://docs.microsoft.com/en-us/windows/desktop/FileIO/naming-a-file), 
so it cannot be in the path.


My first implementation simply applied `s/:/_COLON_` to `frameworkId` 
and `taskId` in the functions in `paths.cpp` which generate Mesos's 
filesystem paths. While this worked, it's kind of a kludge. Or that is 
to say, it would nicer to use the ASCII representation of `%3A` instead. 
Doing so, however, revealed a bug in libprocess (MESOS-9168) that I have 
also fixed and need reviewed, starting here: 
https://reviews.apache.org/r/68420/


So combining the two fixes, the chain maps `:` in `frameworkId` and 
`taskId` to `%3A` (and back when appropriate). This obviously doesn't 
fix any third-party tooling, but being Windows, I don't think there is 
any yet to worry about.


I wanted to get this in for 1.7, but due to a miscommunication, we were 
not able to land it in time. If you can, please review! Or if you have a 
better way of doing this, let me know!


Thanks,

Andy

P.S. Original discussion here: 
https://mesos.slack.com/archives/C1LPTK50T/p153332465396 (our Slack 
archives seem to be down, so this is only available until Slack cycles 
out sadly).


Re: [VOTE] Move the project repos to gitbox

2018-07-17 Thread Andrew Schwartzmeyer
 +1

On 07/17/2018 8:54 am, Zhitao Li wrote: 

> +1 
> 
> On Tue, Jul 17, 2018 at 8:10 AM James Peach  wrote: 
> 
>>> On Jul 17, 2018, at 7:58 AM, Vinod Kone  wrote:
>>> 
>>> Hi,
>>> 
>>> As discussed in another thread and in the committers sync, there seem to be 
>>> heavy interest in moving our project repos ("mesos", "mesos-site") from the 
>>> "git-wip" git server to the new "gitbox" server to better avail GitHub 
>>> integrations.
>>> 
>>> Please vote +1, 0, -1 regarding the move to gitbox. The vote will close in 
>>> 3 business days.
>> 
>> +1
> 
> -- 
> 
> Cheers,
> 
> Zhitao Li
 

Re: Backport Policy

2018-07-13 Thread Andrew Schwartzmeyer

I believe I fall somewhere between Alex and Ben.

As for deciding what to backport or not, I lean toward Alex's view of 
backporting as little as possible (and agree with his criteria). My 
reasoning is that all changes can have unforeseen consequences, which I 
believe is something to be actively avoided in already released 
versions. The reason for backporting patches to fix regressions is the 
same as the reason to avoid backporting as much as possible: keep 
behavior consistent (and safe) within a release. With that as the goal 
of a branch in maintenance mode, it makes sense to fix regressions, and 
make exceptions to fix CVEs and other critical/blocking issues.


As for who should decide what to backport, I lean toward Ben's view of 
the burden being on the committer. I don't think we should add more work 
for release managers, and I think the committer/shepherd obviously has 
the most understanding of the context around changes proposed for 
backport.


Here's an example of a recent bugfix which I backported: 
https://reviews.apache.org/r/67587/ (for MESOS-3790)


While normally I believe this change falls under "avoid due to 
unforeseen consequences," I made an exception as the bug was old, circa 
2015, (indicating it had been an issue for others), and was causing 
recurring failures in testing. The fix itself was very small, meaning it 
was easier to evaluate for possible side effects, so I felt a little 
safer in that regard. The effect of not having the fix was a fatal and 
undesired crash, which furthermore left troublesome side effects on the 
system (you couldn't bring the agent back up). And lastly, a dependent 
project (DC/OS) wanted it in their next bump, which necessitated 
backporting to the release they were pulling in.


I think in general we should backport only as necessary, and leave it on 
the committers to decide if backporting a particular change is 
necessary.


On 07/13/2018 12:54 am, Alex Rukletsov wrote:

This is exactly where our views differ, Ben : )

Ideally, I would like a release manager to have more ownership and less
manual work. In my imagination, a release manager has more power and
control about dates, features, backports and everything that is related 
to
"their" branch. I would also like us to back port as little as 
possible, to

simplify testing and releasing patch versions.

On Fri, Jul 13, 2018 at 1:17 AM, Benjamin Mahler  
wrote:



+user, I probably it would be good to hear from users as well.

Please see the original proposal as well as Alex's proposal and let us 
know

your thoughts.

To continue the discussion from where Alex left off:

> Other bugs and significant improvements, e.g., performance, may be back
ported,
the release manager should ideally be the one who decides on this.

I'm a little puzzled by this, why is the release manager involved? As 
we
already document, backports occur when the bug is fixed, so this 
happens in
the steady state of development, not at release time. The release 
manager

only comes in at the time of the release itself, at which point all
backports have already happened and the release manager handles the 
release

process. Only blocker level issues can stop the release and while the
release manager has a strong say, we should generally agree on what
consists of a release blocking issue.

Just to clarify my workflow, I generally backport every bug fix I 
commit

that applies cleanly, right after I commit it to master (with the
exceptions I listed below).

On Thu, Jul 12, 2018 at 8:39 AM, Alex Rukletsov 
wrote:

> I would like to back port as little as possible. I suggest the following
> criteria:
>
> * By default, regressions are back ported to existing release branches. A
> bug is considered a regression if the functionality is present in the
> previous minor or patch version and is not affected by the bug there.
>
> * Critical and blocker issues, e.g., a CVE, can be back ported.
>
> * Other bugs and significant improvements, e.g., performance, may be back
> ported, the release manager should ideally be the one who decides on
this.
>
> On Thu, Jul 12, 2018 at 12:25 AM, Vinod Kone 
wrote:
>
> > Ben, thanks for the clarification. I'm in agreement with the points you
> > made.
> >
> > Once we have consensus, would you mind updating the doc?
> >
> > On Wed, Jul 11, 2018 at 5:15 PM Benjamin Mahler 
> > wrote:
> >
> > > I realized recently that we aren't all on the same page with
> backporting.
> > > We currently only document the following:
> > >
> > > "Typically the fix for an issue that is affecting supported releases
> > lands
> > > on the master branch and is then backported to the release
branch(es).
> In
> > > rare cases, the fix might directly go into a release branch without
> > landing
> > > on master (e.g., fix / issue is not applicable to master)." [1]
> > >
> > > This leaves room for interpretation about what lies outside of
> "typical".
> > > Here's the simplest way I can explain what I stick to, and I'd like

Re: Build Failure

2018-03-19 Thread Andrew Schwartzmeyer
 It's sad that we fail at runtime like this when certain utilities
aren't installed. We're currently working to replace the extraction
logic with libarchive instead, so we'll no longer implicitly require
unzip/gunzip/tar/7z etc. I think it will probably make it into 1.6.0.

On 03/19/2018 4:38 pm, Shiv Deepak wrote: 

> Thanks. I installed unzip. That worked. 
> 
> On Mon, Mar 19, 2018 at 3:48 PM, Tomek Janiszewski  wrote:
> Do you have unzip installed? Can you try unzipping file like it's done in the 
> test? 
> 
> pon., 19.03.2018, 22:53 użytkownik Shiv Deepak  napisał: 
> Hello,
> 
> I am trying to build Mesos 1.5.0 from source on Ubuntu 16.04. 
> 
> I tried on Docker, VM, and EC2. Three test cases are failing no matter what. 
> 
> Here is the list. 
> 
> [ PASSED ] 1904 TESTS. 
> [ FAILED ] 3 TESTS, LISTED BELOW: 
> [ FAILED ] FETCHERTEST.UNZIP_EXTRACTFILE 
> [ FAILED ] FETCHERTEST.UNZIP_EXTRACTINVALIDFILE 
> [ FAILED ] FETCHERTEST.UNZIP_EXTRACTFILEWITHDUPLICATEDENTRIES 
> 
> Here is the test output: 
> 
> [ RUN ] FETCHERTEST.UNZIP_EXTRACTFILE 
> ../../SRC/TESTS/FETCHER_TESTS.CPP:870: FAILURE 
> (FETCH).FAILURE(): FAILED TO FETCH ALL URIS FOR CONTAINER 
> '709DE28F-5F71-439D-A032-072DF865090F': EXITED WITH STATUS 1 
> [ FAILED ] FETCHERTEST.UNZIP_EXTRACTFILE (297 MS) 
> [ RUN ] FETCHERTEST.UNZIP_EXTRACTINVALIDFILE 
> ../../SRC/TESTS/FETCHER_TESTS.CPP:936: FAILURE 
> VALUE OF: OS::EXISTS(EXTRACTEDFILE) 
> ACTUAL: FALSE 
> EXPECTED: TRUE 
> [ FAILED ] FETCHERTEST.UNZIP_EXTRACTINVALIDFILE (201 MS) 
> [ RUN ] FETCHERTEST.UNZIP_EXTRACTFILEWITHDUPLICATEDENTRIES 
> ../../SRC/TESTS/FETCHER_TESTS.CPP:997: FAILURE 
> (FETCH).FAILURE(): FAILED TO FETCH ALL URIS FOR CONTAINER 
> 'DD749015-3D16-4926-B7F3-E1C96211A461': EXITED WITH STATUS 1 
> [ FAILED ] FETCHERTEST.UNZIP_EXTRACTFILEWITHDUPLICATEDENTRIES (201 MS) 
> 
> Is this expected or do I need to fix something? Can someone please point me 
> in the right direction? 
> 
> Thank you 
> 
> -- 
> 
> [1] 
> 
> Shiv Deepak▌
> Engineering Manager 
> 
> HackerRank 
> 
> Blog [2] / Twitter [3] / Linkedin [4] / Facebook [5]

 -- 

 [1] 

Shiv Deepak▌
Engineering Manager 

HackerRank 

Blog [2] / Twitter [3] / Linkedin [4] / Facebook [5] 

 

Links:
--
[1] https://www.hackerrank.com/
[2] https://blog.hackerrank.com/
[3] https://twitter.com/hackerrank
[4] https://www.linkedin.com/company/hackerrank/
[5] https://www.facebook.com/hackerrank/


Re: Welcome Chun-Hung Hsiao as Mesos Committer and PMC Member

2018-03-11 Thread Andrew Schwartzmeyer

Congratulations Chun!

I apologize for not also giving you a +1, as I certainly would have, but 
just discovered my mailing list isn't working. Just a heads up, don't 
let that happen to you too!


I look forward to continuing to work with you.

Cheers,

Andy

On 03/10/2018 9:14 pm, Jie Yu wrote:

Hi,

I am happy to announce that the PMC has voted Chun-Hung Hsiao as a new
committer and member of PMC for the Apache Mesos project. Please join 
me to

congratulate him!

Chun has been an active contributor for the past year. His main
contributions to the project include:
* Designed and implemented gRPC client support to libprocess 
(MESOS-7749)

* Designed and implemented Storage Local Resource Provider (MESOS-7235,
MESOS-8374)
* Implemented part of the CSI support (MESOS-7235, MESOS-8374)

Chun is friendly and humble, but also intelligent, insightful, and
opinionated. I am confident that he will be a great addition to our
committer pool. Thanks Chun for all your contributions to the project 
so

far!

His committer checklist can be found here:
https://docs.google.com/document/d/1FjroAvjGa5NdP29zM7-2eg6tLPAzQRMUmCorytdEI_U/edit?usp=sharing

- Jie




Re: [VOTE] Release Apache Mesos 1.5.0 (rc2)

2018-02-07 Thread Andrew Schwartzmeyer

+1 (binding)

Passed internal CI and hand tests (debug and release builds). Only 
failure was due to a CI configuration only compatible with 1.6.


On 02/06/2018 4:19 pm, Andrew Schwartzmeyer wrote:

+0 (binding)

We're putting 1.5.0-rc2 through a hybrid DC/OS cluster end-to-end test
suite, but the results won't be back until tomorrow. If we could delay
a day, that'd be great.

On 02/05/2018 9:24 pm, Chun-Hung Hsiao wrote:

+1 (non-binding)

Tested with `make distcheck` with grpc disabled and enabled on mac.
Tested with `make distcheck DISTCHECK_CONFIGURE_FLAGS='--enable-grpc'` 
on

centos 7.

On Mon, Feb 5, 2018 at 8:33 PM, Vinod Kone <vinodk...@apache.org> 
wrote:



+1 (binding)

Tested on ASF CI. The red builds were known flaky tests regarding
checks/health checks.

*Revision*: f7e3872b0359c6095f8eeaefe408cb7dcef5bb83

   - refs/tags/1.5.0-rc2

Configuration Matrix gcc clang
centos:7 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]
<https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Rel
ease/47/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--
verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=
GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%
7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
[image: Not run]
cmake
[image: Success]
<https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Rel
ease/47/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose
%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=
1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%
7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
[image: Not run]
--verbose autotools
[image: Failed]
<https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Rel
ease/47/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--
verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%
3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
[image: Not run]
cmake
[image: Success]
<https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Rel
ease/47/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose
,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_
exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
[image: Not run]
ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
[image: Success]
<https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Rel
ease/47/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--
verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=
GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(
docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
[image: Success]
<https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Rel
ease/47/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=-
-verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=
GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(
docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
cmake
[image: Success]
<https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Rel
ease/47/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose
%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=
1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%
7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
[image: Success]
<https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Rel
ease/47/BUILDTOOL=cmake,COMPILER=clang,CONFIGURATION=--
verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_
v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%
7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
--verbose autotools
[image: Success]
<https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Rel
ease/47/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--
verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%
3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
[image: Success]
<https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Rel
ease/47/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=-
-verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%
3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
cmake
[image: Success]
<https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Rel
ease/47/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose
,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,
label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
[image: Success]
<https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Rel
ease/47/BUILDTOOL=cmake,COMPILER=clang,CONFIGURATION=--
verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A1
4.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>

On Sat, Feb 3, 2018 at 11:11 AM, Zhitao Li <zhitaoli...@gmail.com> 
wrote:


> +1 (non-binding)
>
> Tested with running all tests on Debian/jessie server on AWS.
>
> On Fri, Feb 2, 2018 at 3

Re: [VOTE] Release Apache Mesos 1.5.0 (rc2)

2018-02-06 Thread Andrew Schwartzmeyer

+0 (binding)

We're putting 1.5.0-rc2 through a hybrid DC/OS cluster end-to-end test 
suite, but the results won't be back until tomorrow. If we could delay a 
day, that'd be great.


On 02/05/2018 9:24 pm, Chun-Hung Hsiao wrote:

+1 (non-binding)

Tested with `make distcheck` with grpc disabled and enabled on mac.
Tested with `make distcheck DISTCHECK_CONFIGURE_FLAGS='--enable-grpc'` 
on

centos 7.

On Mon, Feb 5, 2018 at 8:33 PM, Vinod Kone  
wrote:



+1 (binding)

Tested on ASF CI. The red builds were known flaky tests regarding
checks/health checks.

*Revision*: f7e3872b0359c6095f8eeaefe408cb7dcef5bb83

   - refs/tags/1.5.0-rc2

Configuration Matrix gcc clang
centos:7 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]


On Sat, Feb 3, 2018 at 11:11 AM, Zhitao Li  
wrote:


> +1 (non-binding)
>
> Tested with running all tests on Debian/jessie server on AWS.
>
> On Fri, Feb 2, 2018 at 3:25 PM, Jie Yu  wrote:
>
>> +1
>>
>> Verified in our internal CI that `sudo make check` passed in CentOS 6,
>> CentOS7, Debian 8, Ubuntu 14.04, Ubuntu 16.04 (both w/ or w/o SSL
>> enabled).
>>
>>
>> On Thu, Feb 1, 2018 at 5:36 PM, Gilbert Song 
wrote:
>>
>> > Hi all,
>> >
>> > Please vote on releasing the following candidate as Apache Mesos
1.5.0.
>> >
>> > 1.5.0 includes the following:
>> > 

Re: RE: Struggling with running Docker container on Windows agent

2018-02-05 Thread Andrew Schwartzmeyer
 Awesome to hear it!

On 02/05/2018 3:30 pm, ajkf9uvxc ajkf9uvxc wrote: 

> After compiling the tip of master from 2018-02-02 on Windows and then doing 
> the exact same steps as before IT WORKS NOW ! docker ps shows the started 
> container. (In this case the network setting is "networks": [ { "mode": 
> "container/bridge" } ] ) 
> 
> Thanks a lot everybody for your help! 
> 
> On Friday, February 2, 2018, 4:26:11 p.m. PST, Akash Gupta (EOSG) 
> <aka...@microsoft.com> wrote: 
> 
> To summarize: 
> 
> If you update to the latest 1.5.x branch, then it will fix the docker $PATH 
> issue, but you will still run into problems with running docker containers, 
> because Mesos 1.5.x doesn't have the Windows docker network patches that the 
> master Mesos branch has. A work around is to send a "network=nat" through the 
> docker.parameters field in the json like this: 
> 
> "docker": { 
> 
> "parameters": [ 
> 
> { "key": "network", "value": "nat" } 
> 
> ] 
> 
> } 
> 
> If you update to the tip of master, then you should be able to run your job 
> by adding the "networks": [ { "mode": "container/bridge" } ] field to your 
> json. You need the network field because the default network setting in 
> marathon is `HOST` mode, which is Linux only. 
> 
> FROM: ajkf9uvxc ajkf9uvxc [mailto:ajkf9u...@yahoo.com] 
> SENT: Friday, February 2, 2018 3:27 PM
> TO: Andrew Schwartzmeyer <and...@schwartzmeyer.com>
> CC: user@mesos.apache.org; ulri...@activestate.com; Akash Gupta (EOSG) 
> <aka...@microsoft.com>; Joseph Wu <jos...@mesosphere.io>
> SUBJECT: Re: Struggling with running Docker container on Windows agent 
> 
> Yes, I got the same result with Marathon after adding "networks": [ { "mode": 
> "container/bridge" } ], . It sounds like there are multiple reasons for 
> compiling a newer version and the PATH issue you mentioned is the most likely 
> fix that will solve the problem. 
> 
> Knowing what to do next is a big step further. I will tell you how it worked 
> by mid next week. 
> 
> Thank you! 
> 
> On Friday, February 2, 2018, 2:50:34 p.m. PST, Andrew Schwartzmeyer 
> <and...@schwartzmeyer.com> wrote: 
> 
> Oh, geez, this is even simpler.
> 
> We'd temporarily broken the Docker containerizer in 1.5 when we fixed 
> environment variables. You need at least commit 1b6f9e90f, where we fixed it. 
> You don't have to move to the tip of master, we backported it (as af64bcb387) 
> to the 1.5.x branch.
> 
> The bug was: https://issues.apache.org/jira/browse/MESOS-8443 [1]
> 
> commit 1b6f9e90f
> Author: Akash Gupta <aka...@microsoft.com>
> Date: Fri Jan 12 16:23:39 2018 -0800
> 
> Windows: Fixed docker executor `PATH` variable.
> 
> The `docker` executable is not usually installed in
> `os::host_default_path()` on Windows, so the Executor cannot find it.
> Now, before launching the Executor, the Agent finds the directory
> containing `docker` and prepends it to the `PATH` given to the Executor
> so that both the Executor and Agent use the same `docker`.
> 
> Review: https://reviews.apache.org/r/65147 [2]
> 
> Sorry about that!
> 
> Andy 
> 
> On 02/02/2018 2:33 pm, ajkf9uvxc ajkf9uvxc wrote: 
> 
> Thanks for all your replies. 
> 
> Here is the stderr requested by Andy (good to know about this log): 
> 
> I0202 12:52:53.865368 7140 exec.cpp:162] Version: 1.5.0 
> 
> I0202 12:52:53.911371 7684 exec.cpp:237] Executor registered on agent 
> a0664e60-846a-42d0-9586-cf97e997eba3-S0 
> 
> I0202 12:52:53.915374 7192 executor.cpp:120] Registered docker executor on 
> 10.19.10.206 
> 
> I0202 12:52:53.920373 548 executor.cpp:160] Starting task 
> myattempt11_20180202203339zVpxc.07298e1c-085b-11e8-bc6d-ae95ed0c8d88 
> 
> I0202 12:52:59.252701 6752 executor.cpp:546] Failed to run docker container: 
> Failed to create subprocess 'docker': Could not launch child process: Failed 
> to call `CreateProcess`: docker -H npipe:./pipe/docker_engine run 
> --cpu-shares 1024 --memory 536870912 -e HOST=10.19.10.206 -e 
> MARATHON_APP_DOCKER_IMAGE=microsoft/windowsservercore -e 
> MARATHON_APP_ID=/myattempt11/20180202203339zVpxc -e MARATHON_APP_LABELS= -e 
> MARATHON_APP_RESOURCE_CPUS=1.0 -e MARATHON_APP_RESOURCE_DISK=1000.0 -e 
> MARATHON_APP_RESOURCE_MEM=512.0 -e 
> MARATHON_APP_VERSION=1970-01-01T00:00:00.000Z -e 
> MESOS_CONTAINER_NAME=mesos-74298e92-9700-486d-b211-a42e5fd0bf85 -e 
> MESOS_SANDBOX=C:mesossandbox -e 
> MESOS_TASK_ID=myattempt11_20180202203339zVpxc.07298e1c-085b-11e8-bc6d-ae95ed0c8d88
>  -v 
> c:mes

Re: Struggling with running Docker container on Windows agent

2018-02-02 Thread Andrew Schwartzmeyer
 Hello,

Would you please provide me with the executor's stderr log? This can be
found in the work directory on the agent, it should give us a bit more
information as to why it failed to start the task.

It'll be deeply nested, something like:

c:mesoswork_dirslaves7dc02270-a4e1-4f59-9ad7-56bad5182ea4-S3frameworkseb32cef4-c503-4ab7-85d4-8d4577e6a3bf-executorsnotepad.fcf078d1-084a-11e8-8f77-02421c3bc93crunslateststderr
(and stdout)

Thanks,

Andy

On 02/02/2018 1:30 pm, ajkf9uvxc ajkf9uvxc wrote: 

> Hi, 
> 
> I am trying to get a job in DCOS to run a docker container on a Windows agent 
> machine. DCOS was installed using the AWS CF template here: 
> https://downloads.dcos.io/dcos/stable/aws.html [1] (single master). 
> 
> The Windows agent is added: 
> 
> C:mesosmesosbuildsrcmesos-agent.exe --attributes=os:windows 
> --containerizers=docker,mesos --hostname=10.19.10.206 --IP=10.19.10.206 
> --master=zk://10.22.1.94:2181/mesos [2] --work_dir=c:mesoswork_dir 
> --launcher_dir=c:mesosmesosbuildsrc --log_dir=c:mesoslogs 
> 
> And a simple job works: 
> 
> dcos.activestate.com [3] -> Job -> New 
> 
> { 
> 
> "id": "mywindowstest01", 
> 
> "labels": {}, 
> 
> "run": { 
> 
> "cpus": 0.01, 
> 
> "mem": 128, 
> 
> "disk": 0, 
> 
> "cmd": "C:\Windows\System32\cmd.exe /c echo helloworld > 
> c:\mesos\work_dir\helloworld2", 
> 
> "env": {}, 
> 
> "placement": { 
> 
> "constraints": [ 
> 
> { 
> 
> "attribute": "os", 
> 
> "operator": "EQ", 
> 
> "value": "windows" 
> 
> } 
> 
> ] 
> 
> }, 
> 
> "artifacts": [], 
> 
> "maxLaunchDelay": 3600, 
> 
> "volumes": [], 
> 
> "restart": { 
> 
> "policy": "NEVER" 
> 
> } 
> 
> }, 
> 
> "schedules": [] 
> 
> } 
> 
> creates: "c:\mesos\work_dir\helloworld2" 
> 
> The Windows agent has DockerCE installed and is set to run Windows containers 
> (tried with Linux containers as well and getting the same problem, but for 
> the purpose of this question let's stick to Windows containers) 
> 
> I confirmed that it's possible to run a Windows container manually, directly 
> on Windows 10 by starting a Powershell as Administrator and running: 
> 
> docker run -ti microsoft/windowsservercore 
> and 
> 
> docker run microsoft/windowsservercore 
> 
> Both commands create a new container (verified with "docker ps" , besides I 
> get a cmd.exe shell in the conatiner for the first command) 
> 
> Now the problem: 
> 
> trying to run a container from DCOS does not work: 
> 
> dcos job add a.json 
> 
> with the json: 
> 
> { 
> "id": "myattempt11", 
> "labels": {}, 
> "run": { 
> "env": {}, 
> "cpus": 1.00, 
> "mem": 512, 
> "disk": 1000, 
> "placement": { 
> "constraints": [ 
> { 
> "attribute": "os", 
> "operator": "EQ", 
> "value": "windows" 
> } 
> ] 
> }, 
> "artifacts": [], 
> "maxLaunchDelay": 3600, 
> "docker": { 
> "image": "microsoft/windowsservercore" 
> }, 
> "restart": { 
> "policy": "NEVER" 
> } 
> }, 
> "schedules": [] 
> } 
> 
> Does not work: 
> 
> # dcos job add a.json 
> 
> # dcos job run myattempt11 
> Run ID: 20180202203339zVpxc 
> 
> The log on the Mesos Agent on Windows shows activity but not much information 
> about the problem (see "TASK_FAILED" at the end below): 
> 
> Log file created at: 2018/02/02 12:52:47 
> Running on machine: DESKTOP-JJK06UJ 
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg 
> I0202 12:52:47.330880 8388 logging.cpp:201] INFO level logging started! 
> I0202 12:52:47.335886 8388 main.cpp:365] Build: 2017-12-20 23:35:42 UTC by 
> Anne S Bell 
> I0202 12:52:47.335886 8388 main.cpp:366] Version: 1.5.0 
> I0202 12:52:47.337895 8388 main.cpp:373] Git SHA: 
> 327726d3c7272806c8f3c3b7479758c26e55fd43 
> I0202 12:52:47.35 8388 resolver.cpp:69] Creating default secret resolver 
> I0202 12:52:47.574883 8388 containerizer.cpp:304] Using isolation { 
> windows/cpu, filesystem/windows, windows/mem, environment_secret } 
> I0202 12:52:47.577883 8388 provisioner.cpp:299] Using default backend 'copy' 
> I0202 12:52:47.596886 3348 slave.cpp:262] Mesos agent started on 
> (1)@10.19.10.206:5051 [4] 
> I0202 12:52:47.597883 3348 slave.cpp:263] Flags at startup: 
> --appc_simple_discovery_uri_prefix="http://; 
> --appc_store_dir="C:UsersactiveitAppDataLocalTempmesosstoreappc" 
> --attributes="os:windows" --authenticate_http_readonly="false" 
> --authenticate_http_readwrite="false" --authenticatee="crammd5" 
> --authentication_backoff_factor="1secs" --authorizer="local" 
> --container_disk_watch_interval="15secs" --containerizers="docker,mesos" 
> --default_role="*" --disk_watch_interval="1mins" --docker="docker" 
> --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io 
> [5]" --docker_remove_delay="6hrs" --docker_socket="//./pipe/docker_engine" 
> --docker_stop_timeout="0ns" 
> --docker_store_dir="C:UsersactiveitAppDataLocalTempmesosstoredocker" 
> --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
> --enforce_container_disk_quota="false" 
> --executor_registration_timeout="1mins" 
> 

Re: Documentation for Mesos On windows

2017-11-29 Thread Andrew Schwartzmeyer
 Hi sweta Das,

The mesos-master executable is not currently built on Windows. We're
waiting for leveldb to be ported to Windows to enable it
(https://github.com/google/leveldb/issues/519); however, most of the
code works anyway (as evidenced by `StartMaster()` in unit tests etc.).

If you'd like to try to build it, you can edit this code
https://github.com/apache/mesos/blob/5574681ddc7e053fd33c074023ff317394f6449c/src/master/CMakeLists.txt#L18
and remove the "NOT WIN32" guard around compiling the executable. I
can't promise it'll work though, I haven't yet gotten to try it.

Let me know how it goes!

Thanks,

Andy

On 11/29/2017 6:34 pm, Benjamin Mahler wrote: 

> +Andrew 
> 
> On Tue, Nov 28, 2017 at 5:41 PM, sweta Das  wrote:
> 
>> Hi 
>> 
>> Is there any other documentation than the one on mesos site 
>> http://mesos.apache.org/documentation/latest/windows/ [1] 
>> 
>> I was able to build mesos on AWS on an windows 2016 server. But I am not 
>> able to find any docs for starting the mesos master on windows? 
>> I understand that this is not recommended as of now, but for testing can 
>> anyone tell how can i start a master on windows?
>> 
>> Sent from my iPhone
 

Links:
--
[1] http://mesos.apache.org/documentation/latest/windows/


Re: Welcome Andrew Schwartzmeyer as a new committer and PMC member!

2017-11-27 Thread Andrew Schwartzmeyer

Thank you everyone for the welcome!

It's been great working with you this past year, and I'm glad to 
continue making this great project even better.


Thanks again,

Andy

On 11/27/2017 3:00 pm, Joseph Wu wrote:

Hi devs & users,

I'm happy to announce that Andrew Schwartzmeyer has become a new 
committer

and member of the PMC for the Apache Mesos project.  Please join me in
congratulating him!

Andrew has been an active contributor to Mesos for about a year.  He 
has
been the primary contributor behind our efforts to change our default 
build

system to CMake and to port Mesos onto Windows.

Here is his committer candidate checklist for your perusal:
https://docs.google.com/document/d/1MfJRYbxxoX2-A-
g8NEeryUdUi7FvIoNcdUbDbGguH1c/

Congrats Andy!
~Joseph