[jira] [Created] (MESOS-4744) mesos-execute should allow setting role

2016-02-22 Thread Jian Qiu (JIRA)
Jian Qiu created MESOS-4744:
---

 Summary: mesos-execute should allow setting role
 Key: MESOS-4744
 URL: https://issues.apache.org/jira/browse/MESOS-4744
 Project: Mesos
  Issue Type: Bug
  Components: cli
Reporter: Jian Qiu
Priority: Minor


It will be quite useful if we can set role when running mesos-execute



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4580) Consider returning `202` (Accepted) for /reserve and related endpoints

2016-02-22 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158385#comment-15158385
 ] 

Jay Guo commented on MESOS-4580:


Hi, we found this bug interesting. Can we proceed to confirm it as accepted and 
contribute? We are quite new to this community and still need to get familiar 
with work processes. Thanks

/IBM Pair: Jay Guo & Zhou Xing

> Consider returning `202` (Accepted) for /reserve and related endpoints
> --
>
> Key: MESOS-4580
> URL: https://issues.apache.org/jira/browse/MESOS-4580
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>Assignee: Jay Guo
>  Labels: mesosphere
>
> We currently return {{200}} (OK) when a POST to {{/reserve}}, {{/unreserve}}, 
> {{/create-volumes}}, and {{/destroy-volumes}} is validated successfully. This 
> is misleading, because the underlying operation is still dispatched 
> asynchronously and might subsequently fail. It would be more accurate to 
> return {{202}} (Accepted) instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4743) Mesos fetcher not working correctly on docker apps on CoreOS

2016-02-22 Thread Shuai Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158382#comment-15158382
 ] 

Shuai Lin commented on MESOS-4743:
--

Is your mesos slave running inside a container itself? If so it may be related 
to [MESOS-4249].

> Mesos fetcher not working correctly on docker apps on CoreOS
> 
>
> Key: MESOS-4743
> URL: https://issues.apache.org/jira/browse/MESOS-4743
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, fetcher
>Affects Versions: 0.26.0
>Reporter: Guillermo Rodriguez
>
> I initially sent this issue to the Marathon group. They asked me to send it 
> here. This is the original thread:
> https://github.com/mesosphere/marathon/issues/3179
> Then they closed it so I had to ask again with more proof.
> https://github.com/mesosphere/marathon/issues/3213
> In a nutshell, when I start a Marathon task that uses the URI while running 
> on CoreOS. The file is effectively fetched but not passed to the container. I 
> can see the file in the mesos UI but the file is not in the container. It is, 
> however, downloaded to another folder.
> It is very simple to test. The original ticket has two files attaches with a 
> Marathon JSON for a Prometheus server and a prometheus.yml config file. The 
> objective is to start prometheus with the config file.
> CoreOS 899.6
> Mesos 0.26
> Marathon 0.15.2
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4580) Consider returning `202` (Accepted) for /reserve and related endpoints

2016-02-22 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo reassigned MESOS-4580:
--

Assignee: Jay Guo

> Consider returning `202` (Accepted) for /reserve and related endpoints
> --
>
> Key: MESOS-4580
> URL: https://issues.apache.org/jira/browse/MESOS-4580
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>Assignee: Jay Guo
>  Labels: mesosphere
>
> We currently return {{200}} (OK) when a POST to {{/reserve}}, {{/unreserve}}, 
> {{/create-volumes}}, and {{/destroy-volumes}} is validated successfully. This 
> is misleading, because the underlying operation is still dispatched 
> asynchronously and might subsequently fail. It would be more accurate to 
> return {{202}} (Accepted) instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4312) Porting Mesos on Power (ppc64le)

2016-02-22 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158359#comment-15158359
 ] 

Qian Zhang edited comment on MESOS-4312 at 2/23/16 6:20 AM:


Had some off-line discussion with [~hartem] and [~vinodkone], here is the final 
plan we all agree:
# Porting Mesos on Power by upgrading/patching some 3rd party libraries.
#* Update leveldb to v1.18, libev to v4.22, and protobuf to v2.6.1, 
ry-http-parser-1c3624a to nodejs/http-parser v2.6.1 for all platforms, since 
these are their latest upstream stable releases which has officially supported 
Power. Create JIRA ticket for each of them and link them to this JIRA ticket.
#* Patch zookeeper, glog, protobuf only for Power.
#* Change "src/linux/fs.cpp" for all platforms as what I did in 
https://reviews.apache.org/r/42551/.
# Verify SSL, perf and docker related test cases work as expected on all 
platforms.
# Do the validation on all platforms by "sudo make dist check" and "sudo make 
check --benchmark --gtest_filter="\*Benchmark\*"" and make sure the perf 
numbers are OK.
# Run compatibility tests between scheduler <=> master <=> slave <=> executor 
where each of the components is running either the patched/upgraded version or 
not. Will contact with Niklas/Kapil for their script to automate this and 
improve the script if needed.


was (Author: qianzhang):
Had some off-line discussion with [~hartem] and [~vinodkone], here is the final 
plan we all agree:
# Porting Mesos on Power by upgrading/patching some 3rd party libraries.
#* Update leveldb to v1.18, libev to v4.22, and protobuf to v2.6.1, 
ry-http-parser-1c3624a to nodejs/http-parser v2.6.1 for all platforms, since 
these are their latest upstream stable releases which has officially supported 
Power. Create JIRA ticket for each of them and link them to this JIRA ticket.
#* Patch zookeeper, glog, protobuf only for Power.
#* Change "src/linux/fs.cpp" for all platforms as what I did in 
https://reviews.apache.org/r/42551/.
# Verify SSL, perf and docker related test cases work as expected on all 
platforms.
# Do the validation on all platforms by "sudo make dist check" and "sudo make 
check --benchmark --gtest_filter="\*Benchmark\*"" and make sure the perf 
numbers are OK.
# Run compatibility tests between scheduler <=> master <=> slave <=> executor 
where each of the components is running either the patched/upgraded version or 
not. Will contact with Niklas/Kapil for their script and improve it if needed.

> Porting Mesos on Power (ppc64le)
> 
>
> Key: MESOS-4312
> URL: https://issues.apache.org/jira/browse/MESOS-4312
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> The goal of this ticket is to make IBM Power (ppc64le) as a supported 
> hardware platform of Mesos. Currently the latest Mesos code can not be 
> successfully built on ppc64le, we will resolve the build errors in this 
> ticket, and also make sure Mesos test suite ("make check") can be ran 
> successfully on ppc64le. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4312) Porting Mesos on Power (ppc64le)

2016-02-22 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158359#comment-15158359
 ] 

Qian Zhang edited comment on MESOS-4312 at 2/23/16 6:19 AM:


Had some off-line discussion with [~hartem] and [~vinodkone], here is the final 
plan we all agree:
# Porting Mesos on Power by upgrading/patching some 3rd party libraries.
#* Update leveldb to v1.18, libev to v4.22, and protobuf to v2.6.1, 
ry-http-parser-1c3624a to nodejs/http-parser v2.6.1 for all platforms, since 
these are their latest upstream stable releases which has officially supported 
Power. Create JIRA ticket for each of them and link them to this JIRA ticket.
#* Patch zookeeper, glog, protobuf only for Power.
#* Change "src/linux/fs.cpp" for all platforms as what I did in 
https://reviews.apache.org/r/42551/.
# Verify SSL, perf and docker related test cases work as expected on all 
platforms.
# Do the validation on all platforms by "sudo make dist check" and "sudo make 
check --benchmark --gtest_filter="\*Benchmark\*"" and make sure the perf 
numbers are OK.
# Run compatibility tests between scheduler <=> master <=> slave <=> executor 
where each of the components is running either the patched/upgraded version or 
not. Will contact with Niklas/Kapil for their script and improve it if needed.


was (Author: qianzhang):
Had some off-line discussion with [~hartem] and [~vinodkone], here is the final 
plan we all agree:
# Porting Mesos on Power by upgrading/patching some 3rd party libraries.
#* Update leveldb to v1.18, libev to v4.22, and protobuf to v2.6.1, 
ry-http-parser-1c3624a to nodejs/http-parser v2.6.1 for all platforms, since 
these are their latest upstream stable releases which has officially supported 
Power. Create JIRA ticket for each of them and link them to this JIRA ticket.
#* Patch zookeeper, glog, protobuf only for Power.
#* Change "src/linux/fs.cpp" for all platforms as what I did in 
https://reviews.apache.org/r/42551/.
# Verify SSL, perf and docker related test cases work as expected on all 
platforms.
# Do the validation on all platforms by "sudo make dist check" and "sudo make 
check --benchmark --gtest_filter="*Benchmark*"" and make sure the perf numbers 
are OK.
# Run compatibility tests between scheduler <=> master <=> slave <=> executor 
where each of the components is running either the patched/upgraded version or 
not. Will contact with Niklas/Kapil for their script and improve it if needed.

> Porting Mesos on Power (ppc64le)
> 
>
> Key: MESOS-4312
> URL: https://issues.apache.org/jira/browse/MESOS-4312
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> The goal of this ticket is to make IBM Power (ppc64le) as a supported 
> hardware platform of Mesos. Currently the latest Mesos code can not be 
> successfully built on ppc64le, we will resolve the build errors in this 
> ticket, and also make sure Mesos test suite ("make check") can be ran 
> successfully on ppc64le. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4312) Porting Mesos on Power (ppc64le)

2016-02-22 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158359#comment-15158359
 ] 

Qian Zhang edited comment on MESOS-4312 at 2/23/16 6:17 AM:


Had some off-line discussion with [~hartem] and [~vinodkone], here is the final 
plan we all agree:
# Porting Mesos on Power by upgrading/patching some 3rd party libraries.
#* Update leveldb to v1.18, libev to v4.22, and protobuf to v2.6.1, 
ry-http-parser-1c3624a to nodejs/http-parser v2.6.1 for all platforms, since 
these are their latest upstream stable releases which has officially supported 
Power. Create JIRA ticket for each of them and link them to this JIRA ticket.
#* Patch zookeeper, glog, protobuf only for Power.
#* Change "src/linux/fs.cpp" for all platforms as what I did in 
https://reviews.apache.org/r/42551/.
# Verify SSL, perf and docker related test cases work as expected on all 
platforms.
# Do the validation on all platforms by "sudo make dist check" and "sudo make 
check --benchmark --gtest_filter="*Benchmark*"" and make sure the perf numbers 
are OK.
# Run compatibility tests between scheduler <=> master <=> slave <=> executor 
where each of the components is running either the patched/upgraded version or 
not. Will contact with Niklas/Kapil for their script and improve it if needed.


was (Author: qianzhang):
Had some off-line discussion with [~hartem] and [~vinodkone], here is the final 
plan we all agree:
# Porting Mesos on Power by upgrading/patching some 3rd party libraries.
#* Update leveldb to v1.18, libev to v4.22, and protobuf to v2.6.1, 
ry-http-parser-1c3624a to nodejs/http-parser v2.6.1 for all platforms, since 
these are their latest upstream stable releases which has officially supported 
Power. Create JIRA ticket for each of them and link them to this JIRA ticket.
#* Patch zookeeper, glog, protobuf only for Power.
#* Change "src/linux/fs.cpp" for all platforms as what I did in 
https://reviews.apache.org/r/42551/.
# Verify SSL, perf and docker related test cases work as expected on all 
platforms.
# Do the validation on all platforms by "sudo make dist check" and "sudo make 
check --benchmark --gtest_filter="*Benchmark*"" and make sure the perf numbers 
are OK.
# Run compatibility tests between scheduler <-> master <-> slave <-> executor 
where each of the components is running either the patched/upgraded version or 
not. Will contact with Niklas/Kapil for their script and improve it if needed.

> Porting Mesos on Power (ppc64le)
> 
>
> Key: MESOS-4312
> URL: https://issues.apache.org/jira/browse/MESOS-4312
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> The goal of this ticket is to make IBM Power (ppc64le) as a supported 
> hardware platform of Mesos. Currently the latest Mesos code can not be 
> successfully built on ppc64le, we will resolve the build errors in this 
> ticket, and also make sure Mesos test suite ("make check") can be ran 
> successfully on ppc64le. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4312) Porting Mesos on Power (ppc64le)

2016-02-22 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158359#comment-15158359
 ] 

Qian Zhang commented on MESOS-4312:
---

Had some off-line discussion with [~hartem] and [~vinodkone], here is the final 
plan we all agree:
# Porting Mesos on Power by upgrading/patching some 3rd party libraries.
#* Update leveldb to v1.18, libev to v4.22, and protobuf to v2.6.1, 
ry-http-parser-1c3624a to nodejs/http-parser v2.6.1 for all platforms, since 
these are their latest upstream stable releases which has officially supported 
Power. Create JIRA ticket for each of them and link them to this JIRA ticket.
#* Patch zookeeper, glog, protobuf only for Power.
#* Change "src/linux/fs.cpp" for all platforms as what I did in 
https://reviews.apache.org/r/42551/.
# Verify SSL, perf and docker related test cases work as expected on all 
platforms.
# Do the validation on all platforms by "sudo make dist check" and "sudo make 
check --benchmark --gtest_filter="*Benchmark*"" and make sure the perf numbers 
are OK.
# Run compatibility tests between scheduler <-> master <-> slave <-> executor 
where each of the components is running either the patched/upgraded version or 
not. Will contact with Niklas/Kapil for their script and improve it if needed.

> Porting Mesos on Power (ppc64le)
> 
>
> Key: MESOS-4312
> URL: https://issues.apache.org/jira/browse/MESOS-4312
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> The goal of this ticket is to make IBM Power (ppc64le) as a supported 
> hardware platform of Mesos. Currently the latest Mesos code can not be 
> successfully built on ppc64le, we will resolve the build errors in this 
> ticket, and also make sure Mesos test suite ("make check") can be ran 
> successfully on ppc64le. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3024) HTTP endpoint authN is enabled merely by specifying --credentials

2016-02-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-3024:
--
Fix Version/s: 0.27.0

> HTTP endpoint authN is enabled merely by specifying --credentials
> -
>
> Key: MESOS-3024
> URL: https://issues.apache.org/jira/browse/MESOS-3024
> Project: Mesos
>  Issue Type: Bug
>  Components: master, security
>Reporter: Adam B
>Assignee: Till Toenshoff
>  Labels: authentication, http, mesosphere
> Fix For: 0.27.0
>
>
> If I set `--credentials` on the master, framework and slave authentication 
> are allowed, but not required. On the other hand, http authentication is now 
> required for authenticated endpoints (currently only `/shutdown`). That means 
> that I cannot enable framework or slave authentication without also enabling 
> http endpoint authentication. This is undesirable.
> Framework and slave authentication have separate flags (`\--authenticate` and 
> `\--authenticate_slaves`) to require authentication for each. It would be 
> great if there was also such a flag for http authentication. Or maybe we get 
> rid of these flags altogether and rely on ACLs to determine which 
> unauthenticated principals are even allowed to authenticate for each 
> endpoint/action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3481) Add const accessor to Master flags

2016-02-22 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158302#comment-15158302
 ] 

Jay Guo commented on MESOS-3481:


We have submitted patch for review: https://reviews.apache.org/r/43868/

IBM community pair: Jay Guo & Zhou Xing

> Add const accessor to Master flags
> --
>
> Key: MESOS-3481
> URL: https://issues.apache.org/jira/browse/MESOS-3481
> Project: Mesos
>  Issue Type: Task
>Reporter: Joseph Wu
>Assignee: zhou xing
>Priority: Trivial
>  Labels: mesosphere, newbie
>
> It would make sense to have an accessor to the master's flags, especially for 
> tests.
> For example, see [this 
> test|https://github.com/apache/mesos/blob/2876b8c918814347dd56f6f87d461e414a90650a/src/tests/master_maintenance_tests.cpp#L1231-L1235].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3727) File permission inconsistency for mesos-master executable and mesos-init-wrapper.

2016-02-22 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo reassigned MESOS-3727:
--

Assignee: (was: Jay Guo)

> File permission inconsistency for mesos-master executable and 
> mesos-init-wrapper.
> -
>
> Key: MESOS-3727
> URL: https://issues.apache.org/jira/browse/MESOS-3727
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Sarjeet Singh
>Priority: Trivial
>
> There seems some file permission inconsistency for mesos-master executable 
> and mesos-init-wrapper script with mesos-version 0.25.
> node-1:~# dpkg -l | grep mesos
> ii  mesos   0.25.0-0.2.70.ubuntu1404
> node-1:~# ls -ld /usr/sbin/mesos-master
> -rwxr-xr-x 1 root root 289173 Oct 12 14:07 /usr/sbin/mesos-master
> node-1:~# ls -ld /usr/bin/mesos-init-wrapper
> -rwxrwx--- 1 root root 5202 Oct  1 11:17 /usr/bin/mesos-init-wrapper
> Observed the issue when tried to execute the mesos-master executable with 
> non-root user and since, init-wrapper doesn't have any non-root user 
> permission, it didn't get executed and mesos-master didn't get started.
> Should be make these file permission consistent for executable & init-script? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3727) File permission inconsistency for mesos-master executable and mesos-init-wrapper.

2016-02-22 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158251#comment-15158251
 ] 

Jay Guo edited comment on MESOS-3727 at 2/23/16 4:11 AM:
-

We have just confirmed in *0.27.0-0.2.190.ubuntu1404*, the problem persists. We 
should modify permissions of following files in release:

/usr/bin/mesos-init-wrapper 770 --> 775
/etc/default/mesos   640 --> 644
/etc/default/mesos-master   640 --> 644
/etc/default/mesos-slave  640 --> 644

However, where is Mesos release maintained?


was (Author: guoger):
We have just confirmed in *0.27.0-0.2.190.ubuntu1404*, the problem persists. We 
should modify permissions of following files in release:
/usr/bin/mesos-init-wrapper 770 --> 775
/etc/default/mesos   640 --> 644
/etc/default/mesos-master   640 --> 644
/etc/default/mesos-slave  640 --> 644

However, where is Mesos release maintained?

> File permission inconsistency for mesos-master executable and 
> mesos-init-wrapper.
> -
>
> Key: MESOS-3727
> URL: https://issues.apache.org/jira/browse/MESOS-3727
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Sarjeet Singh
>Assignee: Jay Guo
>Priority: Trivial
>
> There seems some file permission inconsistency for mesos-master executable 
> and mesos-init-wrapper script with mesos-version 0.25.
> node-1:~# dpkg -l | grep mesos
> ii  mesos   0.25.0-0.2.70.ubuntu1404
> node-1:~# ls -ld /usr/sbin/mesos-master
> -rwxr-xr-x 1 root root 289173 Oct 12 14:07 /usr/sbin/mesos-master
> node-1:~# ls -ld /usr/bin/mesos-init-wrapper
> -rwxrwx--- 1 root root 5202 Oct  1 11:17 /usr/bin/mesos-init-wrapper
> Observed the issue when tried to execute the mesos-master executable with 
> non-root user and since, init-wrapper doesn't have any non-root user 
> permission, it didn't get executed and mesos-master didn't get started.
> Should be make these file permission consistent for executable & init-script? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3727) File permission inconsistency for mesos-master executable and mesos-init-wrapper.

2016-02-22 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158251#comment-15158251
 ] 

Jay Guo commented on MESOS-3727:


We have just confirmed in *0.27.0-0.2.190.ubuntu1404*, the problem persists. We 
should modify permissions of following files in release:
/usr/bin/mesos-init-wrapper 770 --> 775
/etc/default/mesos   640 --> 644
/etc/default/mesos-master   640 --> 644
/etc/default/mesos-slave  640 --> 644

However, where is Mesos release maintained?

> File permission inconsistency for mesos-master executable and 
> mesos-init-wrapper.
> -
>
> Key: MESOS-3727
> URL: https://issues.apache.org/jira/browse/MESOS-3727
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Sarjeet Singh
>Assignee: Jay Guo
>Priority: Trivial
>
> There seems some file permission inconsistency for mesos-master executable 
> and mesos-init-wrapper script with mesos-version 0.25.
> node-1:~# dpkg -l | grep mesos
> ii  mesos   0.25.0-0.2.70.ubuntu1404
> node-1:~# ls -ld /usr/sbin/mesos-master
> -rwxr-xr-x 1 root root 289173 Oct 12 14:07 /usr/sbin/mesos-master
> node-1:~# ls -ld /usr/bin/mesos-init-wrapper
> -rwxrwx--- 1 root root 5202 Oct  1 11:17 /usr/bin/mesos-init-wrapper
> Observed the issue when tried to execute the mesos-master executable with 
> non-root user and since, init-wrapper doesn't have any non-root user 
> permission, it didn't get executed and mesos-master didn't get started.
> Should be make these file permission consistent for executable & init-script? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4743) Mesos fetcher not working correctly on docker apps on CoreOS

2016-02-22 Thread Guillermo Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guillermo Rodriguez updated MESOS-4743:
---
Description: 
I initially sent this issue to the Marathon group. They asked me to send it 
here. This is the original thread:
https://github.com/mesosphere/marathon/issues/3179

Then they closed it so I had to ask again with more proof.
https://github.com/mesosphere/marathon/issues/3213

In a nutshell, when I start a Marathon task that uses the URI while running on 
CoreOS. The file is effectively fetched but not passed to the container. I can 
see the file in the mesos UI but the file is not in the container. It is, 
however, downloaded to another folder.

It is very simple to test. The original ticket has two files attaches with a 
Marathon JSON for a Prometheus server and a prometheus.yml config file. The 
objective is to start prometheus with the config file.

CoreOS 899.6
Mesos 0.26
Marathon 0.15.2

Thanks!

  was:
I initially sent this issue to the Marathon group. They asked me to send it 
here. This is the original thread:
https://github.com/mesosphere/marathon/issues/3179

Then they closed it so I had to ask again with more proof.
https://github.com/mesosphere/marathon/issues/3213

In a nutshell, when I start a Marathon task that uses the URI while running on 
CoreOS. The file is effectively fetched but not passed to the container. I can 
see the file in the mesos UI but the file is not in the container. It is, 
however, downloaded to another folder.

It is very simple to test. The original ticket has two files attaches with a 
Marathon JSON for a Prometheus server and a prometheus.yml config file. The 
objective is to start prometheus with the config file.

Thanks!


> Mesos fetcher not working correctly on docker apps on CoreOS
> 
>
> Key: MESOS-4743
> URL: https://issues.apache.org/jira/browse/MESOS-4743
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, fetcher
>Affects Versions: 0.26.0
>Reporter: Guillermo Rodriguez
>
> I initially sent this issue to the Marathon group. They asked me to send it 
> here. This is the original thread:
> https://github.com/mesosphere/marathon/issues/3179
> Then they closed it so I had to ask again with more proof.
> https://github.com/mesosphere/marathon/issues/3213
> In a nutshell, when I start a Marathon task that uses the URI while running 
> on CoreOS. The file is effectively fetched but not passed to the container. I 
> can see the file in the mesos UI but the file is not in the container. It is, 
> however, downloaded to another folder.
> It is very simple to test. The original ticket has two files attaches with a 
> Marathon JSON for a Prometheus server and a prometheus.yml config file. The 
> objective is to start prometheus with the config file.
> CoreOS 899.6
> Mesos 0.26
> Marathon 0.15.2
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4743) Mesos fetcher not working correctly on docker apps on CoreOS

2016-02-22 Thread Guillermo Rodriguez (JIRA)
Guillermo Rodriguez created MESOS-4743:
--

 Summary: Mesos fetcher not working correctly on docker apps on 
CoreOS
 Key: MESOS-4743
 URL: https://issues.apache.org/jira/browse/MESOS-4743
 Project: Mesos
  Issue Type: Bug
  Components: docker, fetcher
Affects Versions: 0.26.0
Reporter: Guillermo Rodriguez


I initially sent this issue to the Marathon group. They asked me to send it 
here. This is the original thread:
https://github.com/mesosphere/marathon/issues/3179

Then they closed it so I had to ask again with more proof.
https://github.com/mesosphere/marathon/issues/3213

In a nutshell, when I start a Marathon task that uses the URI while running on 
CoreOS. The file is effectively fetched but not passed to the container. I can 
see the file in the mesos UI but the file is not in the container. It is, 
however, downloaded to another folder.

It is very simple to test. The original ticket has two files attaches with a 
Marathon JSON for a Prometheus server and a prometheus.yml config file. The 
objective is to start prometheus with the config file.

Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4742) Design doc for CNI isolator

2016-02-22 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158175#comment-15158175
 ] 

Qian Zhang commented on MESOS-4742:
---

https://docs.google.com/document/d/1FFZwPHPZqS17cRQvsbbWyQbZpwIoHFR_N6AAApRv514/edit?usp=sharing

> Design doc for CNI isolator
> ---
>
> Key: MESOS-4742
> URL: https://issues.apache.org/jira/browse/MESOS-4742
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> This ticket is for the design of isolator for Container Network Interface 
> (CNI).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4742) Design doc for CNI isolator

2016-02-22 Thread Qian Zhang (JIRA)
Qian Zhang created MESOS-4742:
-

 Summary: Design doc for CNI isolator
 Key: MESOS-4742
 URL: https://issues.apache.org/jira/browse/MESOS-4742
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Reporter: Qian Zhang
Assignee: Qian Zhang


This ticket is for the design of isolator for Container Network Interface (CNI).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4741) Add role information for static reservation in /master/roles

2016-02-22 Thread Klaus Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma updated MESOS-4741:

Description: 
In {{/master/roles}}, it should show static reservation roles if there's no 
tasks.

{code}
Klauss-MacBook-Pro:mesos klaus$ curl http://localhost:5050/master/roles.json | 
python -m json.tool
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
10093  100930 0  13907  0 --:--:-- --:--:-- --:--:-- 15500
{
"roles": [
{
"frameworks": [],
"name": "*",
"resources": {
"cpus": 0,
"disk": 0,
"mem": 0
},
"weight": 1.0
}
]
}
{code}

After submit tasks to r1, it'll show roles.

{code}
Klauss-MacBook-Pro:mesos klaus$ curl http://localhost:5050/master/roles | 
python -m json.tool
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
100   221  100   2210 0  32721  0 --:--:-- --:--:-- --:--:-- 36833
{
"roles": [
{
"frameworks": [],
"name": "*",
"resources": {
"cpus": 0,
"disk": 0,
"mem": 0
},
"weight": 1.0
},
{
"frameworks": [
"b4f15a2e-5d9a-4d31-a29e-7737af41c8e4-0002"
],
"name": "r1",
"resources": {
"cpus": 1.0,
"disk": 0,
"mem": 0
},
"weight": 1.0
}
]
}
{code}

  was:
In {{/master/roles}}, it should show static reservation roles if there's no 
tasks.

{code}
Klauss-MacBook-Pro:mesos klaus$ curl http://localhost:5050/master/roles.json | 
python -m json.tool
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
100   221  100   2210 0  28612  0 --:--:-- --:--:-- --:--:-- 31571
{
"roles": [
{
"frameworks": [],
"name": "*",
"resources": {
"cpus": 0,
"disk": 0,
"mem": 0
},
"weight": 1.0
},
{
"frameworks": [
"b4f15a2e-5d9a-4d31-a29e-7737af41c8e4-0002"
],
"name": "r1",
"resources": {
"cpus": 1.0,
"disk": 0,
"mem": 0
},
"weight": 1.0
}
]
}
{code}

After submit tasks to r1, it'll show roles.

{code}
Klauss-MacBook-Pro:mesos klaus$ curl http://localhost:5050/master/roles | 
python -m json.tool
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
100   221  100   2210 0  32721  0 --:--:-- --:--:-- --:--:-- 36833
{
"roles": [
{
"frameworks": [],
"name": "*",
"resources": {
"cpus": 0,
"disk": 0,
"mem": 0
},
"weight": 1.0
},
{
"frameworks": [
"b4f15a2e-5d9a-4d31-a29e-7737af41c8e4-0002"
],
"name": "r1",
"resources": {
"cpus": 1.0,
"disk": 0,
"mem": 0
},
"weight": 1.0
}
]
}
{code}


> Add role information for static reservation in /master/roles
> 
>
> Key: MESOS-4741
> URL: https://issues.apache.org/jira/browse/MESOS-4741
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Klaus Ma
>Assignee: Klaus Ma
>
> In {{/master/roles}}, it should show static reservation roles if there's no 
> tasks.
> {code}
> Klauss-MacBook-Pro:mesos klaus$ curl http://localhost:5050/master/roles.json 
> | python -m json.tool
>   % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>  Dload  Upload   Total   SpentLeft  Speed
> 10093  100930 0  13907  0 --:--:-- --:--:-- --:--:-- 15500
> {
> "roles": [
> {
> "frameworks": [],
> "name": "*",
> "resources": {
> "cpus": 0,
> "disk": 0,
> "mem": 0
> },
> "weight": 1.0
> }
> ]
> }
> {code}
> After submit tasks to r1, it'll show roles.
> {code}
> Klauss-MacBook-Pro:mesos klaus$ curl http://localhost:5050/master/ro

[jira] [Created] (MESOS-4741) Add role information for static reservation in /master/roles

2016-02-22 Thread Klaus Ma (JIRA)
Klaus Ma created MESOS-4741:
---

 Summary: Add role information for static reservation in 
/master/roles
 Key: MESOS-4741
 URL: https://issues.apache.org/jira/browse/MESOS-4741
 Project: Mesos
  Issue Type: Bug
  Components: HTTP API
Reporter: Klaus Ma
Assignee: Klaus Ma


In {{/master/roles}}, it should show static reservation roles if there's no 
tasks.

{code}
Klauss-MacBook-Pro:mesos klaus$ curl http://localhost:5050/master/roles.json | 
python -m json.tool
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
100   221  100   2210 0  28612  0 --:--:-- --:--:-- --:--:-- 31571
{
"roles": [
{
"frameworks": [],
"name": "*",
"resources": {
"cpus": 0,
"disk": 0,
"mem": 0
},
"weight": 1.0
},
{
"frameworks": [
"b4f15a2e-5d9a-4d31-a29e-7737af41c8e4-0002"
],
"name": "r1",
"resources": {
"cpus": 1.0,
"disk": 0,
"mem": 0
},
"weight": 1.0
}
]
}
{code}

After submit tasks to r1, it'll show roles.

{code}
Klauss-MacBook-Pro:mesos klaus$ curl http://localhost:5050/master/roles | 
python -m json.tool
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
100   221  100   2210 0  32721  0 --:--:-- --:--:-- --:--:-- 36833
{
"roles": [
{
"frameworks": [],
"name": "*",
"resources": {
"cpus": 0,
"disk": 0,
"mem": 0
},
"weight": 1.0
},
{
"frameworks": [
"b4f15a2e-5d9a-4d31-a29e-7737af41c8e4-0002"
],
"name": "r1",
"resources": {
"cpus": 1.0,
"disk": 0,
"mem": 0
},
"weight": 1.0
}
]
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2198) Document that TaskIDs should not be reused

2016-02-22 Thread Qian Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Zhang reassigned MESOS-2198:
-

Assignee: Qian Zhang

> Document that TaskIDs should not be reused
> --
>
> Key: MESOS-2198
> URL: https://issues.apache.org/jira/browse/MESOS-2198
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation, framework
>Reporter: Robert Lacroix
>Assignee: Qian Zhang
>  Labels: documentation
>
> Let's update the documentation for TaskID to indicate that reuse is not 
> recommended, as per the discussion below.
> -
> Old Summary: Scheduler#statusUpdate should not be called multiple times for 
> the same status update
> Currently Scheduler#statusUpdate can be called multiple times for the same 
> status update, for example when the slave retransmits a status update because 
> it's not acknowledged in time. Especially for terminal status updates this 
> can lead to unexpected scheduler behavior when task id's are being reused.
> Consider this scenario:
> * Scheduler schedules task
> * Task fails, slave sends TASK_FAILED
> * Scheduler is busy and libmesos doesn't acknowledge update in time
> * Slave retransmits TASK_FAILED
> * Scheduler eventually receives first TASK_FAILED and reschedules task
> * Second TASK_FAILED triggers statusUpdate again and the scheduler can't 
> determine if the TASK_FAILED belongs to the first or second run of the task.
> It would be a lot better if libmesos would dedupe status updates and only 
> call Scheduler#statusUpdate once per status update it received. Retries with 
> the same UUID shouldn't cause Scheduler#statusUpdate to be executed again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4602) Invalid usage of ATOMIC_FLAG_INIT in member initialization

2016-02-22 Thread Yong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158047#comment-15158047
 ] 

Yong Tang commented on MESOS-4602:
--

That seems to be an easy fix. Just submitted a review request:
https://reviews.apache.org/r/43859/

> Invalid usage of ATOMIC_FLAG_INIT in member initialization
> --
>
> Key: MESOS-4602
> URL: https://issues.apache.org/jira/browse/MESOS-4602
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Benjamin Bannier
>Assignee: Yong Tang
>  Labels: newbie, tech-debt
>
> MESOS-2925 fixed a few instances where {{ATOMIC_FLAG_INIT}} was used in 
> initializer lists, but missed to fix 
> {{3rdparty/libprocess/src/libevent_ssl_socket.cpp}} (even though the 
> corresponding header was touched).
> There, {{LibeventSSLSocketImpl}}'s {{lock}} member is still (incorrectly) 
> initialized in initializer lists, even though the member is already 
> initialized in the class declaration, so it appears they should be dropped.
> Clang from trunk incorrectly diagnoses the initializations in the initializer 
> lists as benign redundant braces in initialization of a scalar, but they 
> should be fixed for the reasons stated in MESOS-2925.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4602) Invalid usage of ATOMIC_FLAG_INIT in member initialization

2016-02-22 Thread Yong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yong Tang reassigned MESOS-4602:


Assignee: Yong Tang

> Invalid usage of ATOMIC_FLAG_INIT in member initialization
> --
>
> Key: MESOS-4602
> URL: https://issues.apache.org/jira/browse/MESOS-4602
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Benjamin Bannier
>Assignee: Yong Tang
>  Labels: newbie, tech-debt
>
> MESOS-2925 fixed a few instances where {{ATOMIC_FLAG_INIT}} was used in 
> initializer lists, but missed to fix 
> {{3rdparty/libprocess/src/libevent_ssl_socket.cpp}} (even though the 
> corresponding header was touched).
> There, {{LibeventSSLSocketImpl}}'s {{lock}} member is still (incorrectly) 
> initialized in initializer lists, even though the member is already 
> initialized in the class declaration, so it appears they should be dropped.
> Clang from trunk incorrectly diagnoses the initializations in the initializer 
> lists as benign redundant braces in initialization of a scalar, but they 
> should be fixed for the reasons stated in MESOS-2925.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4676) ROOT_DOCKER_Logs is flaky.

2016-02-22 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158030#comment-15158030
 ] 

Joseph Wu commented on MESOS-4676:
--

Confirmed that this is a docker issue.  I fished out a command string from a 
failed test with {{GLOG_v=1}}, then ran it independently repeatedly.

On Ubuntu12 (another place we're seeing the failure):
{code}
sh -c 'while true; do 
  docker -H unix:///var/run/docker.sock run --cpu-shares 2048 --memory 
1073741824 \
  -e MESOS_SANDBOX=/mnt/mesos/sandbox \
  -e 
MESOS_CONTAINER_NAME=mesos-3672e44c-2c92-48d5-825e-e8475227ad88-S0.bdb7f52c-5d3e-46f9-b676-4e693fb0d1f2
 \
  -v 
/tmp/DockerContainerizerTest_ROOT_DOCKER_Logs_vSwYXT/slaves/3672e44c-2c92-48d5-825e-e8475227ad88-S0/frameworks/3672e44c-2c92-48d5-825e-e8475227ad88-/executors/1/runs/bdb7f52c-5d3e-46f9-b676-4e693fb0d1f2:/mnt/mesos/sandbox
 \
  --net host \
  --entrypoint /bin/sh \
  alpine -c "echo outd5d895af-0c86-41bc-9f27-037ab12d8035 ; echo 
errd5d895af-0c86-41bc-9f27-037ab12d8035 1>&2"; 
done' 2>&1 | grep -v \
  -e "^outd5d895af-0c86-41bc-9f27-037ab12d8035$" \
  -e "^errd5d895af-0c86-41bc-9f27-037ab12d8035$" \
  -e "^WARNING: Your kernel does not support swap limit capabilities, memory 
limited without swap.$"
{code}

After about an hour (don't know exactly how many iterations), got the following 
output:
{code}
(outd5d895af-0c86-41bc-9f27-037abUnrecognized input header
(outd5d895af-0c86-41bc-9f27-037abUnrecognized input header
(errd5d895af-0c86-41bc-9f27-037abUnrecognized input header
(errd5d895af-0c86-41bc-9f27-037abUnrecognized input header
...
{code}

> ROOT_DOCKER_Logs is flaky.
> --
>
> Key: MESOS-4676
> URL: https://issues.apache.org/jira/browse/MESOS-4676
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27
> Environment: CentOS 7 with SSL.
>Reporter: Bernd Mathiske
>  Labels: flaky, mesosphere, test
>
> {noformat}
> [18:06:25][Step 8/8] [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_Logs
> [18:06:25][Step 8/8] I0215 17:06:25.256103  1740 leveldb.cpp:174] Opened db 
> in 6.548327ms
> [18:06:25][Step 8/8] I0215 17:06:25.258002  1740 leveldb.cpp:181] Compacted 
> db in 1.837816ms
> [18:06:25][Step 8/8] I0215 17:06:25.258059  1740 leveldb.cpp:196] Created db 
> iterator in 22044ns
> [18:06:25][Step 8/8] I0215 17:06:25.258076  1740 leveldb.cpp:202] Seeked to 
> beginning of db in 2347ns
> [18:06:25][Step 8/8] I0215 17:06:25.258091  1740 leveldb.cpp:271] Iterated 
> through 0 keys in the db in 571ns
> [18:06:25][Step 8/8] I0215 17:06:25.258152  1740 replica.cpp:779] Replica 
> recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [18:06:25][Step 8/8] I0215 17:06:25.258936  1758 recover.cpp:447] Starting 
> replica recovery
> [18:06:25][Step 8/8] I0215 17:06:25.259177  1758 recover.cpp:473] Replica is 
> in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.260327  1757 replica.cpp:673] Replica in 
> EMPTY status received a broadcasted recover request from 
> (13608)@172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.260545  1758 recover.cpp:193] Received a 
> recover response from a replica in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.261065  1757 master.cpp:376] Master 
> 112363e2-c680-4946-8fee-d0626ed8b21e (ip-172-30-2-239.mesosphere.io) started 
> on 172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.261209  1761 recover.cpp:564] Updating 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.261086  1757 master.cpp:378] Flags at 
> startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/HncLLj/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/HncLLj/master" 
> --zk_session_timeout="10secs"
> [18:06:25][Step 8/8] I0215 17:06:25.261446  1757 master.cpp:423] Master only 
> allowing authenticated frameworks to register
> [18:06:25][Step 8/8] I0215 17:06:25.261456  1757 master.cpp:428] Master only 
> allowing authenticated slaves to register
> [18:06:25][Step 8/8] I0215 17:06:25.261462  1757 credentials.hpp:35]

[jira] [Created] (MESOS-4740) Improve metrics/snapshot performace

2016-02-22 Thread Cong Wang (JIRA)
Cong Wang created MESOS-4740:


 Summary: Improve metrics/snapshot performace
 Key: MESOS-4740
 URL: https://issues.apache.org/jira/browse/MESOS-4740
 Project: Mesos
  Issue Type: Task
Reporter: Cong Wang
Assignee: Cong Wang


David Robinson noticed retrieving metrics/snapshot statistics could be very 
inefficient and cause Mesos master stuck.

{noformat}
[root@atla-bny-34-sr1 ~]# time curl -s localhost:5051/metrics/snapshot

real2m7.302s
user0m0.001s
sys0m0.004s
{noformat}

>From a quick glance of the code, this *seems* due to we sort all the values 
>saved in the time series when calculating percentiles.

{noformat}
foreach (const typename TimeSeries::Value& value, values_) {
  values.push_back(value.data);
}

std::sort(values.begin(), values.end());
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4738) Make ingress and egress bandwidth a resource

2016-02-22 Thread Sargun Dhillon (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157981#comment-15157981
 ] 

Sargun Dhillon commented on MESOS-4738:
---

We could build something to determine the bandwidth on the machine, or the 
kinds of NICs on the machine to model bandwidth. I would say we should have a 
plan of implementing this resource, because it is in demand.

> Make ingress and egress bandwidth a resource
> 
>
> Key: MESOS-4738
> URL: https://issues.apache.org/jira/browse/MESOS-4738
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Sargun Dhillon
>Priority: Minor
>  Labels: mesosphere
>
> Some of our users care about variable network network isolation. Although we 
> cannot fundamentally limit ingress network bandwidth, having it as a 
> resource, so we can drop packets above a specific limit would be attractive. 
> It would be nice to expose egress and ingress bandwidth as an agent resource, 
> perhaps with a default of 10,000 mbps, and we can allow people to adjust as 
> needed. Alternatively, a more advanced design would involve generating 
> heuristics based on an analysis of the network MII / PHY. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3007) Support systemd with Mesos

2016-02-22 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-3007:

Summary: Support systemd with Mesos  (was: Support systemd with Mesos 
containerizer)

> Support systemd with Mesos
> --
>
> Key: MESOS-3007
> URL: https://issues.apache.org/jira/browse/MESOS-3007
> Project: Mesos
>  Issue Type: Epic
>Reporter: Artem Harutyunyan
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4738) Make ingress and egress bandwidth a resource

2016-02-22 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157959#comment-15157959
 ] 

Ian Downes commented on MESOS-4738:
---

[~adam-mesos] The port_mapping isolator does measure ingress and egress 
bandwidth per container (plus many other network related metrics) and can set 
egress limits (currently at the agent level but the actual isolating code is 
flexible). We are definitely interested in supporting bandwidth as a resource!

[~wangcong] for visibility.

> Make ingress and egress bandwidth a resource
> 
>
> Key: MESOS-4738
> URL: https://issues.apache.org/jira/browse/MESOS-4738
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Sargun Dhillon
>Priority: Minor
>  Labels: mesosphere
>
> Some of our users care about variable network network isolation. Although we 
> cannot fundamentally limit ingress network bandwidth, having it as a 
> resource, so we can drop packets above a specific limit would be attractive. 
> It would be nice to expose egress and ingress bandwidth as an agent resource, 
> perhaps with a default of 10,000 mbps, and we can allow people to adjust as 
> needed. Alternatively, a more advanced design would involve generating 
> heuristics based on an analysis of the network MII / PHY. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4737) document TaskID uniqueness requirement

2016-02-22 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157920#comment-15157920
 ] 

Adam B commented on MESOS-4737:
---

Duplicate of MESOS-2198

> document TaskID uniqueness requirement
> --
>
> Key: MESOS-4737
> URL: https://issues.apache.org/jira/browse/MESOS-4737
> Project: Mesos
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.27.0
>Reporter: Erik Weathers
>Assignee: Erik Weathers
>Priority: Minor
>  Labels: documentation
> Attachments: Reusing Task IDs.pdf
>
>
> There are comments above the definition of TaskID in 
> [mesos.proto|https://github.com/apache/mesos/blob/0.27.0/include/mesos/mesos.proto#L63-L66]
>  which lead one to believe it is ok to reuse TaskID values so long as you 
> guarantee there will only ever be 1 such TaskID running at the same time.
> {code: title=existing comments for TaskID}
>  * A framework generated ID to distinguish a task. The ID must remain
>  * unique while the task is active. However, a framework can reuse an
>  * ID _only_ if a previous task with the same ID has reached a
>  * terminal state (e.g., TASK_FINISHED, TASK_LOST, TASK_KILLED, etc.).
> {code}
> However, there are a few scenarios where problems can arise.
> # The checkpointing-and-recovery feature of mesos-slave/agent clashes with 
> tasks that reuse an ID and get assigned to the same executor.
> #* See [this 
> email|https://mail-archives.apache.org/mod_mbox/mesos-user/201602.mbox/%3CCAO5KYW8%2BXMWc1dXtEo20BAsfGow028jwjL2ubMinP%2BK%2BvdOh8w%40mail.gmail.com%3E]
>  for more info, as well as the attachment on this issue.
> # Issues during network partitions and master failover, where a TaskID might 
> appear to be unique in the system, whereas in actuality another Task is 
> running with that ID and was just partitioned away for some time.
> In light of these issues, we should simply update the document(s) to make it 
> abundantly clear that reusing TaskIDs is never ok.  At the minimum this 
> should involve updating the afore-mentioned comments in {{mesos.proto}}.  
> Also any framework development guides that talk about TaskID creation should 
> be updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4738) Make ingress and egress bandwidth a resource

2016-02-22 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157896#comment-15157896
 ] 

Adam B commented on MESOS-4738:
---

Have you tried this with custom resource types yet? Does that work for you?
I'm not sure we should first-class a new resource type unless we can measure 
and isolate it.

> Make ingress and egress bandwidth a resource
> 
>
> Key: MESOS-4738
> URL: https://issues.apache.org/jira/browse/MESOS-4738
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Sargun Dhillon
>Priority: Minor
>  Labels: mesosphere
>
> Some of our users care about variable network network isolation. Although we 
> cannot fundamentally limit ingress network bandwidth, having it as a 
> resource, so we can drop packets above a specific limit would be attractive. 
> It would be nice to expose egress and ingress bandwidth as an agent resource, 
> perhaps with a default of 10,000 mbps, and we can allow people to adjust as 
> needed. Alternatively, a more advanced design would involve generating 
> heuristics based on an analysis of the network MII / PHY. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4739) libprocess CHECK failure in SlaveRecoveryTest/0.ReconnectHTTPExecutor

2016-02-22 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157869#comment-15157869
 ] 

Neil Conway commented on MESOS-4739:


cc [~anandmazumdar] [~mcypark]

> libprocess CHECK failure in SlaveRecoveryTest/0.ReconnectHTTPExecutor
> -
>
> Key: MESOS-4739
> URL: https://issues.apache.org/jira/browse/MESOS-4739
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, libprocess
>Reporter: Neil Conway
>  Labels: flaky-test, libprocess, mesosphere
>
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.ReconnectHTTPExecutor
> I0223 09:38:55.434953 11158 executor.cpp:172] Version: 0.28.0
> Received a SUBSCRIBED event
> Starting task 1
> Finishing task 1
> Received an ERROR event
> Received an ERROR event
> E0223 09:38:55.504820 11159 executor.cpp:553] End-Of-File received from 
> agent. The agent closed the event stream
> Received an ERROR event
> Received an ERROR event
> Received an ERROR event
> F0223 09:39:00.535778 22159 process.cpp:1114] Check failed: items.size() > 0
> *** Check failure stack trace: ***
> Received an ERROR event
> Received an ERROR event
> @ 0x7f4affd0e754  google::LogMessage::Fail()
> Received an ERROR event
> Received an ERROR event
> Received an ERROR event
> Received an ERROR event
> @ 0x7f4affd0e6ad  google::LogMessage::SendToLog()
> @ 0x7f4affd0e0a3  google::LogMessage::Flush()
> @ 0x7f4affd10f14  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f4affc618d4  process::HttpProxy::waited()
> @ 0x7f4affc8f57f  
> _ZZN7process8dispatchINS_9HttpProxyERKNS_6FutureINS_4http8ResponseEEES5_EEvRKNS_3PIDIT_EEMS9_FvT0_ET1_ENKUlPNS_11ProcessBaseEE_clESI_
> @ 0x7f4affcac946  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchINS0_9HttpProxyERKNS0_6FutureINS0_4http8ResponseEEES9_EEvRKNS0_3PIDIT_EEMSD_FvT0_ET1_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_
> @ 0x7f4affc89961  std::function<>::operator()()
> @ 0x7f4affc6ef02  process::ProcessBase::visit()
> @ 0x7f4affc74e52  process::DispatchEvent::visit()
> @   0xa3afe8  process::ProcessBase::serve()
> @ 0x7f4affc6b073  process::ProcessManager::resume()
> @ 0x7f4affc6813b  
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt6atomicIbEE_clES4_
> @ 0x7f4affc745fa  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt6atomicIbEE_St17reference_wrapperIS4_EEE6__callIvJEJLm0T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @ 0x7f4affc745a8  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt6atomicIbEE_St17reference_wrapperIS4_EEEclIJEvEET0_DpOT_
> @ 0x7f4affc74556  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt6atomicIbEE_St17reference_wrapperIS5_EEEvEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
> @ 0x7f4affc744bf  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt6atomicIbEE_St17reference_wrapperIS5_EEEvEEclEv
> @ 0x7f4affc7445e  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt6atomicIbEE_St17reference_wrapperIS7_EEEvEEE6_M_runEv
> @ 0x7f4afa6ddc40  execute_native_thread_routine
> @ 0x7f4afadba424  start_thread
> @ 0x7f4af9e50cbd  __clone
> @  (nil)  (unknown)
> Aborted (core dumped)
> {noformat}
> This crash was observed in a recent ArchLinux VM (Virtualbox), running 
> concurrently with {{stress --cpu 4}}. Repro'd with {{./src/mesos-tests 
> --gtest_filter="SlaveRecovery*" --gtest_repeat=100 
> --gtest_break_on_failure}}; took about 20 iterations to trigger a crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2717) Qemu/KVM containerizer

2016-02-22 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157860#comment-15157860
 ] 

James Peach commented on MESOS-2717:


Not really. It is still on my TODO list, but I'd be happy to pass it on if 
someone else can work on it immediately :)

> Qemu/KVM containerizer
> --
>
> Key: MESOS-2717
> URL: https://issues.apache.org/jira/browse/MESOS-2717
> Project: Mesos
>  Issue Type: Wish
>  Components: containerization
>Reporter: Pierre-Yves Ritschard
>Assignee: James Peach
>
> I think it would make sense for Mesos to have the ability to treat 
> hypervisors as containerizers and the most sensible one to start with would 
> probably be Qemu/KVM.
> There are a few workloads that can require full-fledged VMs (the most obvious 
> one being Windows workloads).
> The containerization code is well decoupled and seems simple enough, I can 
> definitely take a shot at it. VMs do bring some questions with them here is 
> my take on them:
> 1. Routing, network strategy
> ==
> The simplest approach here might very well be to go for bridged networks
> and leave the setup and inter slave routing up to the administrator
> 2. IP Address assignment
> 
> At first, it can be up to the Frameworks to deal with IP assignment.
> The simplest way to address this could be to have an executor running
> on slaves providing the qemu/kvm containerizer which would instrument a DHCP 
> server and collect IP + Mac address resources from slaves. While it may be up 
> to the frameworks to provide this, an example should most likely be provided.
> 3. VM Templates
> ==
> VM templates should probably leverage the fetcher and could thus be copied 
> locally or fetch from HTTP(s) / HDFS.
> 4. Resource limiting
> 
> Mapping resouce constraints to the qemu command line is probably the easiest 
> part, Additional command line should also be fetchable. For Unix VMs, the 
> sandbox could show the output of the serial console
> 5. Libvirt / plain Qemu
> =
> I tend to favor limiting the amount of necessary hoops to jump through and 
> would thus investigate working directly with Qemu, maintaining an open 
> connection to the monitor to assert status.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4739) libprocess CHECK failure in SlaveRecoveryTest/0.ReconnectHTTPExecutor

2016-02-22 Thread Neil Conway (JIRA)
Neil Conway created MESOS-4739:
--

 Summary: libprocess CHECK failure in 
SlaveRecoveryTest/0.ReconnectHTTPExecutor
 Key: MESOS-4739
 URL: https://issues.apache.org/jira/browse/MESOS-4739
 Project: Mesos
  Issue Type: Bug
  Components: HTTP API, libprocess
Reporter: Neil Conway


{noformat}
[ RUN  ] SlaveRecoveryTest/0.ReconnectHTTPExecutor
I0223 09:38:55.434953 11158 executor.cpp:172] Version: 0.28.0
Received a SUBSCRIBED event
Starting task 1
Finishing task 1
Received an ERROR event
Received an ERROR event
E0223 09:38:55.504820 11159 executor.cpp:553] End-Of-File received from agent. 
The agent closed the event stream
Received an ERROR event
Received an ERROR event
Received an ERROR event
F0223 09:39:00.535778 22159 process.cpp:1114] Check failed: items.size() > 0
*** Check failure stack trace: ***
Received an ERROR event
Received an ERROR event
@ 0x7f4affd0e754  google::LogMessage::Fail()
Received an ERROR event
Received an ERROR event
Received an ERROR event
Received an ERROR event
@ 0x7f4affd0e6ad  google::LogMessage::SendToLog()
@ 0x7f4affd0e0a3  google::LogMessage::Flush()
@ 0x7f4affd10f14  google::LogMessageFatal::~LogMessageFatal()
@ 0x7f4affc618d4  process::HttpProxy::waited()
@ 0x7f4affc8f57f  
_ZZN7process8dispatchINS_9HttpProxyERKNS_6FutureINS_4http8ResponseEEES5_EEvRKNS_3PIDIT_EEMS9_FvT0_ET1_ENKUlPNS_11ProcessBaseEE_clESI_
@ 0x7f4affcac946  
_ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchINS0_9HttpProxyERKNS0_6FutureINS0_4http8ResponseEEES9_EEvRKNS0_3PIDIT_EEMSD_FvT0_ET1_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_
@ 0x7f4affc89961  std::function<>::operator()()
@ 0x7f4affc6ef02  process::ProcessBase::visit()
@ 0x7f4affc74e52  process::DispatchEvent::visit()
@   0xa3afe8  process::ProcessBase::serve()
@ 0x7f4affc6b073  process::ProcessManager::resume()
@ 0x7f4affc6813b  
_ZZN7process14ProcessManager12init_threadsEvENKUlRKSt6atomicIbEE_clES4_
@ 0x7f4affc745fa  
_ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt6atomicIbEE_St17reference_wrapperIS4_EEE6__callIvJEJLm0T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
@ 0x7f4affc745a8  
_ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt6atomicIbEE_St17reference_wrapperIS4_EEEclIJEvEET0_DpOT_
@ 0x7f4affc74556  
_ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt6atomicIbEE_St17reference_wrapperIS5_EEEvEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
@ 0x7f4affc744bf  
_ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt6atomicIbEE_St17reference_wrapperIS5_EEEvEEclEv
@ 0x7f4affc7445e  
_ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt6atomicIbEE_St17reference_wrapperIS7_EEEvEEE6_M_runEv
@ 0x7f4afa6ddc40  execute_native_thread_routine
@ 0x7f4afadba424  start_thread
@ 0x7f4af9e50cbd  __clone
@  (nil)  (unknown)
Aborted (core dumped)
{noformat}

This crash was observed in a recent ArchLinux VM (Virtualbox), running 
concurrently with {{stress --cpu 4}}. Repro'd with {{./src/mesos-tests 
--gtest_filter="SlaveRecovery*" --gtest_repeat=100 --gtest_break_on_failure}}; 
took about 20 iterations to trigger a crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4738) Make ingress and egress bandwidth a resource

2016-02-22 Thread Sargun Dhillon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sargun Dhillon updated MESOS-4738:
--
Description: 
Some of our users care about variable network network isolation. Although we 
cannot fundamentally limit ingress network bandwidth, having it as a resource, 
so we can drop packets above a specific limit would be attractive. 

It would be nice to expose egress and ingress bandwidth as an agent resource, 
perhaps with a default of 10,000 mbps, and we can allow people to adjust as 
needed. Alternatively, a more advanced design would involve generating 
heuristics based on an analysis of the network MII / PHY. 



  was:
Some of our users care about variable network network isolation. Although we 
cannot fundamentally limit ingress network bandwidth, having it as a resource, 
so we can drop packets above a specific limit would be attractive. 




> Make ingress and egress bandwidth a resource
> 
>
> Key: MESOS-4738
> URL: https://issues.apache.org/jira/browse/MESOS-4738
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Sargun Dhillon
>Priority: Minor
>  Labels: mesosphere
>
> Some of our users care about variable network network isolation. Although we 
> cannot fundamentally limit ingress network bandwidth, having it as a 
> resource, so we can drop packets above a specific limit would be attractive. 
> It would be nice to expose egress and ingress bandwidth as an agent resource, 
> perhaps with a default of 10,000 mbps, and we can allow people to adjust as 
> needed. Alternatively, a more advanced design would involve generating 
> heuristics based on an analysis of the network MII / PHY. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4738) Make ingress and egress bandwidth a resource

2016-02-22 Thread Sargun Dhillon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sargun Dhillon updated MESOS-4738:
--
Labels: mesosphere  (was: )

> Make ingress and egress bandwidth a resource
> 
>
> Key: MESOS-4738
> URL: https://issues.apache.org/jira/browse/MESOS-4738
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Sargun Dhillon
>Priority: Minor
>  Labels: mesosphere
>
> Some of our users care about variable network network isolation. Although we 
> cannot fundamentally limit ingress network bandwidth, having it as a 
> resource, so we can drop packets above a specific limit would be attractive. 
> It would be nice to expose egress and ingress bandwidth as an agent resource, 
> perhaps with a default of 10,000 mbps, and we can allow people to adjust as 
> needed. Alternatively, a more advanced design would involve generating 
> heuristics based on an analysis of the network MII / PHY. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4736) DockerContainerizerTest.ROOT_DOCKER_LaunchWithPersistentVolumes is flaky

2016-02-22 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu reassigned MESOS-4736:


Assignee: Joseph Wu

> DockerContainerizerTest.ROOT_DOCKER_LaunchWithPersistentVolumes is flaky
> 
>
> Key: MESOS-4736
> URL: https://issues.apache.org/jira/browse/MESOS-4736
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.0
> Environment: Centos6 + GCC 4.9 on AWS
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: flaky, mesosphere, test
>
> This test passes consistently on other OS's, but fails consistently on CentOS 
> 6.
> Verbose logs from test failure:
> {code}
> [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_LaunchWithPersistentVolumes
> I0222 18:16:12.327957 26681 leveldb.cpp:174] Opened db in 7.466102ms
> I0222 18:16:12.330528 26681 leveldb.cpp:181] Compacted db in 2.540139ms
> I0222 18:16:12.330580 26681 leveldb.cpp:196] Created db iterator in 16908ns
> I0222 18:16:12.330592 26681 leveldb.cpp:202] Seeked to beginning of db in 
> 1403ns
> I0222 18:16:12.330600 26681 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 315ns
> I0222 18:16:12.330634 26681 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0222 18:16:12.331082 26698 recover.cpp:447] Starting replica recovery
> I0222 18:16:12.331289 26698 recover.cpp:473] Replica is in EMPTY status
> I0222 18:16:12.332162 26703 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (13761)@172.30.2.148:35274
> I0222 18:16:12.332701 26701 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0222 18:16:12.333230 26699 recover.cpp:564] Updating replica status to 
> STARTING
> I0222 18:16:12.334102 26698 master.cpp:376] Master 
> 652149b4-3932-4d8b-ba6f-8c9d9045be70 (ip-172-30-2-148.mesosphere.io) started 
> on 172.30.2.148:35274
> I0222 18:16:12.334116 26698 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/QEhLBS/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/QEhLBS/master" 
> --zk_session_timeout="10secs"
> I0222 18:16:12.334354 26698 master.cpp:423] Master only allowing 
> authenticated frameworks to register
> I0222 18:16:12.334363 26698 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0222 18:16:12.334369 26698 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/QEhLBS/credentials'
> I0222 18:16:12.335366 26698 master.cpp:468] Using default 'crammd5' 
> authenticator
> I0222 18:16:12.335492 26698 master.cpp:537] Using default 'basic' HTTP 
> authenticator
> I0222 18:16:12.335623 26698 master.cpp:571] Authorization enabled
> I0222 18:16:12.335752 26703 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 2.314693ms
> I0222 18:16:12.335769 26700 whitelist_watcher.cpp:77] No whitelist given
> I0222 18:16:12.335778 26703 replica.cpp:320] Persisted replica status to 
> STARTING
> I0222 18:16:12.335821 26697 hierarchical.cpp:144] Initialized hierarchical 
> allocator process
> I0222 18:16:12.335965 26701 recover.cpp:473] Replica is in STARTING status
> I0222 18:16:12.336771 26703 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (13763)@172.30.2.148:35274
> I0222 18:16:12.337191 26696 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0222 18:16:12.337635 26700 recover.cpp:564] Updating replica status to VOTING
> I0222 18:16:12.337671 26703 master.cpp:1712] The newly elected leader is 
> master@172.30.2.148:35274 with id 652149b4-3932-4d8b-ba6f-8c9d9045be70
> I0222 18:16:12.337698 26703 master.cpp:1725] Elected as the leading master!
> I0222 18:16:12.337713 26703 master.cpp:1470] Recovering from registrar
> I0222 18:16:12.337828 26696 registrar.cpp:307] Recovering registrar
> I0222 18:16:12.339972 26702 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 2.060

[jira] [Created] (MESOS-4738) Make ingress and egress bandwidth a resource

2016-02-22 Thread Sargun Dhillon (JIRA)
Sargun Dhillon created MESOS-4738:
-

 Summary: Make ingress and egress bandwidth a resource
 Key: MESOS-4738
 URL: https://issues.apache.org/jira/browse/MESOS-4738
 Project: Mesos
  Issue Type: Improvement
Reporter: Sargun Dhillon
Priority: Minor


Some of our users care about variable network network isolation. Although we 
cannot fundamentally limit ingress network bandwidth, having it as a resource, 
so we can drop packets above a specific limit would be attractive. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4737) document TaskID uniqueness requirement

2016-02-22 Thread Erik Weathers (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Weathers updated MESOS-4737:
-
Attachment: Reusing Task IDs.pdf

> document TaskID uniqueness requirement
> --
>
> Key: MESOS-4737
> URL: https://issues.apache.org/jira/browse/MESOS-4737
> Project: Mesos
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.27.0
>Reporter: Erik Weathers
>Assignee: Erik Weathers
>Priority: Minor
>  Labels: documentation
> Attachments: Reusing Task IDs.pdf
>
>
> There are comments above the definition of TaskID in 
> [mesos.proto|https://github.com/apache/mesos/blob/0.27.0/include/mesos/mesos.proto#L63-L66]
>  which lead one to believe it is ok to reuse TaskID values so long as you 
> guarantee there will only ever be 1 such TaskID running at the same time.
> {code: title=existing comments for TaskID}
>  * A framework generated ID to distinguish a task. The ID must remain
>  * unique while the task is active. However, a framework can reuse an
>  * ID _only_ if a previous task with the same ID has reached a
>  * terminal state (e.g., TASK_FINISHED, TASK_LOST, TASK_KILLED, etc.).
> {code}
> However, there are a few scenarios where problems can arise.
> # The checkpointing-and-recovery feature of mesos-slave/agent clashes with 
> tasks that reuse an ID and get assigned to the same executor.
> #* See [this 
> email|https://mail-archives.apache.org/mod_mbox/mesos-user/201602.mbox/%3CCAO5KYW8%2BXMWc1dXtEo20BAsfGow028jwjL2ubMinP%2BK%2BvdOh8w%40mail.gmail.com%3E]
>  for more info, as well as the attachment on this issue.
> # Issues during network partitions and master failover, where a TaskID might 
> appear to be unique in the system, whereas in actuality another Task is 
> running with that ID and was just partitioned away for some time.
> In light of these issues, we should simply update the document(s) to make it 
> abundantly clear that reusing TaskIDs is never ok.  At the minimum this 
> should involve updating the afore-mentioned comments in {{mesos.proto}}.  
> Also any framework development guides that talk about TaskID creation should 
> be updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4737) document TaskID uniqueness requirement

2016-02-22 Thread Erik Weathers (JIRA)
Erik Weathers created MESOS-4737:


 Summary: document TaskID uniqueness requirement
 Key: MESOS-4737
 URL: https://issues.apache.org/jira/browse/MESOS-4737
 Project: Mesos
  Issue Type: Task
  Components: documentation
Affects Versions: 0.27.0
Reporter: Erik Weathers
Assignee: Erik Weathers
Priority: Minor


There are comments above the definition of TaskID in 
[mesos.proto|https://github.com/apache/mesos/blob/0.27.0/include/mesos/mesos.proto#L63-L66]
 which lead one to believe it is ok to reuse TaskID values so long as you 
guarantee there will only ever be 1 such TaskID running at the same time.

{code title=existing comments for TaskID}
 * A framework generated ID to distinguish a task. The ID must remain
 * unique while the task is active. However, a framework can reuse an
 * ID _only_ if a previous task with the same ID has reached a
 * terminal state (e.g., TASK_FINISHED, TASK_LOST, TASK_KILLED, etc.).
{code}

However, there are a few scenarios where problems can arise.

# The checkpointing-and-recovery feature of mesos-slave/agent clashes with 
tasks that reuse an ID and get assigned to the same executor.
#* See [this 
email|https://mail-archives.apache.org/mod_mbox/mesos-user/201602.mbox/%3CCAO5KYW8%2BXMWc1dXtEo20BAsfGow028jwjL2ubMinP%2BK%2BvdOh8w%40mail.gmail.com%3E]
 for more info, as well as the attachment on this issue.
# Issues during network partitions and master failover, where a TaskID might 
appear to be unique in the system, whereas in actuality another Task is running 
with that ID and was just partitioned away for some time.

In light of these issues, we should simply update the document(s) to make it 
abundantly clear that reusing TaskIDs is never ok.  At the minimum this should 
involve updating the afore-mentioned comments in {{mesos.proto}}.  Also any 
framework development guides that talk about TaskID creation should be updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4737) document TaskID uniqueness requirement

2016-02-22 Thread Erik Weathers (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Weathers updated MESOS-4737:
-
Description: 
There are comments above the definition of TaskID in 
[mesos.proto|https://github.com/apache/mesos/blob/0.27.0/include/mesos/mesos.proto#L63-L66]
 which lead one to believe it is ok to reuse TaskID values so long as you 
guarantee there will only ever be 1 such TaskID running at the same time.

{code: title=existing comments for TaskID}
 * A framework generated ID to distinguish a task. The ID must remain
 * unique while the task is active. However, a framework can reuse an
 * ID _only_ if a previous task with the same ID has reached a
 * terminal state (e.g., TASK_FINISHED, TASK_LOST, TASK_KILLED, etc.).
{code}

However, there are a few scenarios where problems can arise.

# The checkpointing-and-recovery feature of mesos-slave/agent clashes with 
tasks that reuse an ID and get assigned to the same executor.
#* See [this 
email|https://mail-archives.apache.org/mod_mbox/mesos-user/201602.mbox/%3CCAO5KYW8%2BXMWc1dXtEo20BAsfGow028jwjL2ubMinP%2BK%2BvdOh8w%40mail.gmail.com%3E]
 for more info, as well as the attachment on this issue.
# Issues during network partitions and master failover, where a TaskID might 
appear to be unique in the system, whereas in actuality another Task is running 
with that ID and was just partitioned away for some time.

In light of these issues, we should simply update the document(s) to make it 
abundantly clear that reusing TaskIDs is never ok.  At the minimum this should 
involve updating the afore-mentioned comments in {{mesos.proto}}.  Also any 
framework development guides that talk about TaskID creation should be updated.

  was:
There are comments above the definition of TaskID in 
[mesos.proto|https://github.com/apache/mesos/blob/0.27.0/include/mesos/mesos.proto#L63-L66]
 which lead one to believe it is ok to reuse TaskID values so long as you 
guarantee there will only ever be 1 such TaskID running at the same time.

{code title=existing comments for TaskID}
 * A framework generated ID to distinguish a task. The ID must remain
 * unique while the task is active. However, a framework can reuse an
 * ID _only_ if a previous task with the same ID has reached a
 * terminal state (e.g., TASK_FINISHED, TASK_LOST, TASK_KILLED, etc.).
{code}

However, there are a few scenarios where problems can arise.

# The checkpointing-and-recovery feature of mesos-slave/agent clashes with 
tasks that reuse an ID and get assigned to the same executor.
#* See [this 
email|https://mail-archives.apache.org/mod_mbox/mesos-user/201602.mbox/%3CCAO5KYW8%2BXMWc1dXtEo20BAsfGow028jwjL2ubMinP%2BK%2BvdOh8w%40mail.gmail.com%3E]
 for more info, as well as the attachment on this issue.
# Issues during network partitions and master failover, where a TaskID might 
appear to be unique in the system, whereas in actuality another Task is running 
with that ID and was just partitioned away for some time.

In light of these issues, we should simply update the document(s) to make it 
abundantly clear that reusing TaskIDs is never ok.  At the minimum this should 
involve updating the afore-mentioned comments in {{mesos.proto}}.  Also any 
framework development guides that talk about TaskID creation should be updated.


> document TaskID uniqueness requirement
> --
>
> Key: MESOS-4737
> URL: https://issues.apache.org/jira/browse/MESOS-4737
> Project: Mesos
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.27.0
>Reporter: Erik Weathers
>Assignee: Erik Weathers
>Priority: Minor
>  Labels: documentation
>
> There are comments above the definition of TaskID in 
> [mesos.proto|https://github.com/apache/mesos/blob/0.27.0/include/mesos/mesos.proto#L63-L66]
>  which lead one to believe it is ok to reuse TaskID values so long as you 
> guarantee there will only ever be 1 such TaskID running at the same time.
> {code: title=existing comments for TaskID}
>  * A framework generated ID to distinguish a task. The ID must remain
>  * unique while the task is active. However, a framework can reuse an
>  * ID _only_ if a previous task with the same ID has reached a
>  * terminal state (e.g., TASK_FINISHED, TASK_LOST, TASK_KILLED, etc.).
> {code}
> However, there are a few scenarios where problems can arise.
> # The checkpointing-and-recovery feature of mesos-slave/agent clashes with 
> tasks that reuse an ID and get assigned to the same executor.
> #* See [this 
> email|https://mail-archives.apache.org/mod_mbox/mesos-user/201602.mbox/%3CCAO5KYW8%2BXMWc1dXtEo20BAsfGow028jwjL2ubMinP%2BK%2BvdOh8w%40mail.gmail.com%3E]
>  for more info, as well as the attachment on this issue.
> # Issues during network partitions and master failover, where a TaskID might 
> appear t

[jira] [Commented] (MESOS-4047) MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery is flaky

2016-02-22 Thread Alexander Rojas (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157704#comment-15157704
 ] 

Alexander Rojas commented on MESOS-4047:


Reproduced again with following message (CentOS 6.7):

{noformat}
[==] Running 1 test from 1 test case.
[--] Global test environment set-up.
[--] 1 test from MemoryPressureMesosTest
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.000394345 s, 2.7 GB/s
[ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
I0222 09:32:20.622694 20868 leveldb.cpp:174] Opened db in 5.153509ms
I0222 09:32:20.624688 20868 leveldb.cpp:181] Compacted db in 1.914323ms
I0222 09:32:20.624778 20868 leveldb.cpp:196] Created db iterator in 24549ns
I0222 09:32:20.624795 20868 leveldb.cpp:202] Seeked to beginning of db in 2610ns
I0222 09:32:20.624804 20868 leveldb.cpp:271] Iterated through 0 keys in the db 
in 323ns
I0222 09:32:20.624874 20868 replica.cpp:779] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0222 09:32:20.625977 20888 recover.cpp:447] Starting replica recovery
I0222 09:32:20.626901 20888 recover.cpp:473] Replica is in EMPTY status
I0222 09:32:20.634701 20889 replica.cpp:673] Replica in EMPTY status received a 
broadcasted recover request from (11193)@127.0.0.1:54769
I0222 09:32:20.634953 20888 master.cpp:376] Master 
17b7da64-0c4d-4e46-ae1f-2b356dc5f266 (localhost) started on 127.0.0.1:54769
I0222 09:32:20.634986 20888 master.cpp:378] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/0rXncF/credentials" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="100secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/0rXncF/master" 
--zk_session_timeout="10secs"
W0222 09:32:20.635417 20888 master.cpp:381]
**
Master bound to loopback interface! Cannot communicate with remote schedulers 
or slaves. You might want to set '--ip' flag to a routable IP address.
**
I0222 09:32:20.635587 20888 master.cpp:423] Master only allowing authenticated 
frameworks to register
I0222 09:32:20.635601 20888 master.cpp:428] Master only allowing authenticated 
slaves to register
I0222 09:32:20.635622 20888 credentials.hpp:35] Loading credentials for 
authentication from '/tmp/0rXncF/credentials'
I0222 09:32:20.636018 20888 master.cpp:468] Using default 'crammd5' 
authenticator
I0222 09:32:20.636190 20888 master.cpp:537] Using default 'basic' HTTP 
authenticator
I0222 09:32:20.636174 20887 recover.cpp:193] Received a recover response from a 
replica in EMPTY status
I0222 09:32:20.636425 20888 master.cpp:571] Authorization enabled
I0222 09:32:20.637810 20885 recover.cpp:564] Updating replica status to STARTING
I0222 09:32:20.640805 20887 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 2.741248ms
I0222 09:32:20.640964 20887 replica.cpp:320] Persisted replica status to 
STARTING
I0222 09:32:20.641525 20885 recover.cpp:473] Replica is in STARTING status
I0222 09:32:20.642133 20888 master.cpp:1712] The newly elected leader is 
master@127.0.0.1:54769 with id 17b7da64-0c4d-4e46-ae1f-2b356dc5f266
I0222 09:32:20.642236 20888 master.cpp:1725] Elected as the leading master!
I0222 09:32:20.642253 20888 master.cpp:1470] Recovering from registrar
I0222 09:32:20.642496 20885 registrar.cpp:307] Recovering registrar
I0222 09:32:20.643162 20889 replica.cpp:673] Replica in STARTING status 
received a broadcasted recover request from (11195)@127.0.0.1:54769
I0222 09:32:20.643590 20885 recover.cpp:193] Received a recover response from a 
replica in STARTING status
I0222 09:32:20.644120 20887 recover.cpp:564] Updating replica status to VOTING
I0222 09:32:20.646817 20889 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 1.190281ms
I0222 09:32:20.646870 20889 replica.cpp:320] Persisted replica status to VOTING
I0222 09:32:20.647094 20885 recover.cpp:578] Successfully joined the Paxos group
I0222 09:32:20.647337 20885 recover.cpp:462] Recover process terminated
I0222 09:32:20.647781 20887 log.cpp:659] Attempting to start the writer
I0222 09:32:20.648854 20890 replica.cpp:493] Repli

[jira] [Updated] (MESOS-4686) Implement master failover tests for the scheduler library.

2016-02-22 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4686:
--
Sprint: Mesosphere Sprint 29

Review Chain: https://reviews.apache.org/r/43846/

> Implement master failover tests for the scheduler library.
> --
>
> Key: MESOS-4686
> URL: https://issues.apache.org/jira/browse/MESOS-4686
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, the scheduler library creates its own {{MasterDetector}} object 
> internally. We would need to create a standalone detector and create new 
> tests for testing that callbacks are invoked correctly in the event of a 
> master failover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4736) DockerContainerizerTest.ROOT_DOCKER_LaunchWithPersistentVolumes is flaky

2016-02-22 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-4736:


 Summary: 
DockerContainerizerTest.ROOT_DOCKER_LaunchWithPersistentVolumes is flaky
 Key: MESOS-4736
 URL: https://issues.apache.org/jira/browse/MESOS-4736
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.28.0
 Environment: Centos6 + GCC 4.9 on AWS
Reporter: Joseph Wu


This test passes consistently on other OS's, but fails consistently on CentOS 6.

Verbose logs from test failure:
{code}
[ RUN  ] DockerContainerizerTest.ROOT_DOCKER_LaunchWithPersistentVolumes
I0222 18:16:12.327957 26681 leveldb.cpp:174] Opened db in 7.466102ms
I0222 18:16:12.330528 26681 leveldb.cpp:181] Compacted db in 2.540139ms
I0222 18:16:12.330580 26681 leveldb.cpp:196] Created db iterator in 16908ns
I0222 18:16:12.330592 26681 leveldb.cpp:202] Seeked to beginning of db in 1403ns
I0222 18:16:12.330600 26681 leveldb.cpp:271] Iterated through 0 keys in the db 
in 315ns
I0222 18:16:12.330634 26681 replica.cpp:779] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0222 18:16:12.331082 26698 recover.cpp:447] Starting replica recovery
I0222 18:16:12.331289 26698 recover.cpp:473] Replica is in EMPTY status
I0222 18:16:12.332162 26703 replica.cpp:673] Replica in EMPTY status received a 
broadcasted recover request from (13761)@172.30.2.148:35274
I0222 18:16:12.332701 26701 recover.cpp:193] Received a recover response from a 
replica in EMPTY status
I0222 18:16:12.333230 26699 recover.cpp:564] Updating replica status to STARTING
I0222 18:16:12.334102 26698 master.cpp:376] Master 
652149b4-3932-4d8b-ba6f-8c9d9045be70 (ip-172-30-2-148.mesosphere.io) started on 
172.30.2.148:35274
I0222 18:16:12.334116 26698 master.cpp:378] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/QEhLBS/credentials" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="100secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/QEhLBS/master" 
--zk_session_timeout="10secs"
I0222 18:16:12.334354 26698 master.cpp:423] Master only allowing authenticated 
frameworks to register
I0222 18:16:12.334363 26698 master.cpp:428] Master only allowing authenticated 
slaves to register
I0222 18:16:12.334369 26698 credentials.hpp:35] Loading credentials for 
authentication from '/tmp/QEhLBS/credentials'
I0222 18:16:12.335366 26698 master.cpp:468] Using default 'crammd5' 
authenticator
I0222 18:16:12.335492 26698 master.cpp:537] Using default 'basic' HTTP 
authenticator
I0222 18:16:12.335623 26698 master.cpp:571] Authorization enabled
I0222 18:16:12.335752 26703 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 2.314693ms
I0222 18:16:12.335769 26700 whitelist_watcher.cpp:77] No whitelist given
I0222 18:16:12.335778 26703 replica.cpp:320] Persisted replica status to 
STARTING
I0222 18:16:12.335821 26697 hierarchical.cpp:144] Initialized hierarchical 
allocator process
I0222 18:16:12.335965 26701 recover.cpp:473] Replica is in STARTING status
I0222 18:16:12.336771 26703 replica.cpp:673] Replica in STARTING status 
received a broadcasted recover request from (13763)@172.30.2.148:35274
I0222 18:16:12.337191 26696 recover.cpp:193] Received a recover response from a 
replica in STARTING status
I0222 18:16:12.337635 26700 recover.cpp:564] Updating replica status to VOTING
I0222 18:16:12.337671 26703 master.cpp:1712] The newly elected leader is 
master@172.30.2.148:35274 with id 652149b4-3932-4d8b-ba6f-8c9d9045be70
I0222 18:16:12.337698 26703 master.cpp:1725] Elected as the leading master!
I0222 18:16:12.337713 26703 master.cpp:1470] Recovering from registrar
I0222 18:16:12.337828 26696 registrar.cpp:307] Recovering registrar
I0222 18:16:12.339972 26702 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 2.06039ms
I0222 18:16:12.339994 26702 replica.cpp:320] Persisted replica status to VOTING
I0222 18:16:12.340082 26700 recover.cpp:578] Successfully joined the Paxos group
I0222 18:16:12.340267 26700 recover.cpp:462] Recover process terminated
I0222 18:16:12.340591 26699 log.cpp:659] Attempting to start the writer
I0222 18:16:12.341594 26698 replica.cpp:493] Replica received implicit promise 
request from (13764)@172.30.2.148:35274 with proposal 1

[jira] [Commented] (MESOS-4642) Mesos Agent Json API can dump binary data from log files out as invalid JSON

2016-02-22 Thread Chris Pennello (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157470#comment-15157470
 ] 

Chris Pennello commented on MESOS-4642:
---

bq. In my opinion, encoding everything as UTF-8 would be an incorrect approach. 
Doing this introduces ambiguity with regards to the original data.
Indeed, without an extra encoding scheme, this seems like an unavoidable 
consequence of using JSON.
bq. For us to actually output valid JSON, we need to encode the output as 
unicode.
I think it's a little worse than that; per the above, there simply exists data 
that is unrepresentable.

Clients still do have {{files/read}} to use if they want the raw data, right?

One idea is to add extra, Unicode-friendly encoding to {{files/read.json}} for 
raw data.  For example, it could be Base64-encoded and _then_ dumped to JSON.

Maybe as a more client-friendly idea, perhaps we could augment 
{{files/read.json}} such that sequences of bytes that can't be interpreted as 
UTF-8 encoded Unicode are replaced by a {{?}} character?  [(This is kind of 
akin to Python's {{unicode(..., 
errors='replace')}}.)|https://docs.python.org/2/howto/unicode.html#the-unicode-type]
  That way, we'd be able to get valid JSON out of {{files/read.json}} (a 
plus!), and have "reasonable" behavior for unrepresentable data.

As a wild idea, perhaps if we still wanted endpoints that could represent 
arbitrary, but _structured_ data, we might consider adding an additional 
serialization format, such as [MessagePack|http://msgpack.org/].

> Mesos Agent Json API can dump binary data from log files out as invalid JSON
> 
>
> Key: MESOS-4642
> URL: https://issues.apache.org/jira/browse/MESOS-4642
> Project: Mesos
>  Issue Type: Bug
>  Components: json api, slave
>Affects Versions: 0.27.0
>Reporter: Steven Schlansker
>Priority: Critical
>
> One of our tasks accidentally started logging binary data to stderr.  This 
> was not intentional and generally should not happen -- however, it causes 
> severe problems with the Mesos Agent "files/read.json" API, since it gladly 
> dumps this binary data out as invalid JSON.
> {code}
> # hexdump -C /path/to/task/stderr | tail
> 0003d1f0  6f 6e 6e 65 63 74 69 6f  6e 0a 4e 45 54 3a 20 31  |onnection.NET: 1|
> 0003d200  20 6f 6e 72 65 61 64 20  45 4e 4f 45 4e 54 20 32  | onread ENOENT 2|
> 0003d210  39 35 34 35 36 20 32 35  31 20 32 39 35 37 30 37  |95456 251 295707|
> 0003d220  0a 01 00 00 00 00 00 00  ac 57 65 64 2c 20 31 30  |.Wed, 10|
> 0003d230  20 55 6e 72 65 63 6f 67  6e 69 7a 65 64 20 69 6e  | Unrecognized in|
> 0003d240  70 75 74 20 68 65 61 64  65 72 0a |put header.|
> {code}
> {code}
> # curl 
> 'http://agent-host:5051/files/read.json?path=/path/to/task/stderr&offset=220443&length=9&grep='
>  | hexdump -C
> 7970  6e 65 63 74 69 6f 6e 5c  6e 4e 45 54 3a 20 31 20  |nection\nNET: 1 |
> 7980  6f 6e 72 65 61 64 20 45  4e 4f 45 4e 54 20 32 39  |onread ENOENT 29|
> 7990  35 34 35 36 20 32 35 31  20 32 39 35 37 30 37 5c  |5456 251 295707\|
> 79a0  6e 5c 75 30 30 30 31 5c  75 30 30 30 30 5c 75 30  |n\u0001\u\u0|
> 79b0  30 30 30 5c 75 30 30 30  30 5c 75 30 30 30 30 5c  |000\u\u\|
> 79c0  75 30 30 30 30 5c 75 30  30 30 30 ac 57 65 64 2c  |u\u.Wed,|
> 79d0  20 31 30 20 55 6e 72 65  63 6f 67 6e 69 7a 65 64  | 10 Unrecognized|
> 79e0  20 69 6e 70 75 74 20 68  65 61 64 65 72 5c 6e 22  | input header\n"|
> 79f0  2c 22 6f 66 66 73 65 74  22 3a 32 32 30 34 34 33  |,"offset":220443|
> 7a00  7d|}|
> {code}
> This causes downstream sadness:
> {code}
> ERROR [2016-02-10 18:55:12,303] 
> io.dropwizard.jersey.errors.LoggingExceptionMapper: Error handling a request: 
> 0ee749630f8b26f1
> ! com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0xac
> !  at [Source: org.jboss.netty.buffer.ChannelBufferInputStream@6d69ee8; line: 
> 1, column: 31181]
> ! at 
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1487) 
> ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:518)
>  ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidInitial(UTF8StreamJsonParser.java:3339)
>  ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidChar(UTF8StreamJsonParser.java:)
>  ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2360)
>  ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString(UTF8Strea

[jira] [Assigned] (MESOS-4700) Allow agent to configure net_cls handle minor range.

2016-02-22 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan reassigned MESOS-4700:


Assignee: Avinash Sridharan

> Allow agent to configure net_cls handle minor range.
> 
>
> Key: MESOS-4700
> URL: https://issues.apache.org/jira/browse/MESOS-4700
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Bug exists in some user libraries that prevents some certain minor net_cls 
> handle being used. It'll be great if we can configure the minor range through 
> agent flags.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3368) Add device support in cgroups abstraction

2016-02-22 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-3368:
---
Assignee: Abhishek Dasgupta  (was: Kevin Klues)

> Add device support in cgroups abstraction
> -
>
> Key: MESOS-3368
> URL: https://issues.apache.org/jira/browse/MESOS-3368
> Project: Mesos
>  Issue Type: Task
>Reporter: Niklas Quarfot Nielsen
>Assignee: Abhishek Dasgupta
>
> Add support for [device 
> cgroups|https://www.kernel.org/doc/Documentation/cgroup-v1/devices.txt] to 
> aid isolators controlling access to devices.
> In the future, we could think about how to numerate and control access to 
> devices as resource or task/container policy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4623) Add a stub Nvidia GPU isolator.

2016-02-22 Thread Abhishek Dasgupta (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157241#comment-15157241
 ] 

Abhishek Dasgupta commented on MESOS-4623:
--

ok..no worries

> Add a stub Nvidia GPU isolator.
> ---
>
> Key: MESOS-4623
> URL: https://issues.apache.org/jira/browse/MESOS-4623
> Project: Mesos
>  Issue Type: Task
>  Components: isolation
>Reporter: Benjamin Mahler
>Assignee: Kevin Klues
>
> We'll first wire up a skeleton Nvidia GPU isolator, which needs to be guarded 
> by a configure flag due to the dependency on NVML.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4623) Add a stub Nvidia GPU isolator.

2016-02-22 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157238#comment-15157238
 ] 

Kevin Klues commented on MESOS-4623:


I already have most of this in place, it is just not out for review yet.  I've 
been developing on the nvidia-isolator branch of my mesos fork.

https://github.com/klueska-mesosphere/mesos/tree/nvidia-isolator

> Add a stub Nvidia GPU isolator.
> ---
>
> Key: MESOS-4623
> URL: https://issues.apache.org/jira/browse/MESOS-4623
> Project: Mesos
>  Issue Type: Task
>  Components: isolation
>Reporter: Benjamin Mahler
>Assignee: Kevin Klues
>
> We'll first wire up a skeleton Nvidia GPU isolator, which needs to be guarded 
> by a configure flag due to the dependency on NVML.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4623) Add a stub Nvidia GPU isolator.

2016-02-22 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues reassigned MESOS-4623:
--

Assignee: Kevin Klues

> Add a stub Nvidia GPU isolator.
> ---
>
> Key: MESOS-4623
> URL: https://issues.apache.org/jira/browse/MESOS-4623
> Project: Mesos
>  Issue Type: Task
>  Components: isolation
>Reporter: Benjamin Mahler
>Assignee: Kevin Klues
>
> We'll first wire up a skeleton Nvidia GPU isolator, which needs to be guarded 
> by a configure flag due to the dependency on NVML.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4424) Initial support for GPU resources.

2016-02-22 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues reassigned MESOS-4424:
--

Assignee: Kevin Klues

> Initial support for GPU resources.
> --
>
> Key: MESOS-4424
> URL: https://issues.apache.org/jira/browse/MESOS-4424
> Project: Mesos
>  Issue Type: Epic
>  Components: isolation
>Reporter: Benjamin Mahler
>Assignee: Kevin Klues
>
> Mesos already has generic mechanisms for expressing / isolating resources, 
> and we'd like to expose GPUs as resources that can be consumed and isolated. 
> However, GPUs present unique challenges:
> * Users may rely on vendor-specific libraries to interact with the device 
> (e.g. CUDA, HSA, etc), others may rely on portable libraries like OpenCL or 
> OpenGL. These libraries need to be available from within the container.
> * GPU hardware has many attributes that may impose scheduling constraints 
> (e.g. core count, total memory, topology (via PCI-E, NVLINK, etc), driver 
> versions, etc).
> * Obtaining utilization information requires vendor-specific approaches.
> * Isolated sharing of a GPU device requires vendor-specific approaches.
> As such, the focus is on supporting a narrow initial use case: homogenous 
> device-level GPU support:
> * Fractional sharing of GPU devices across containers will not be supported 
> initially, unlike CPU cores.
> * Heterogeneity will be supported via other means for now (e.g. using agent 
> attributes to differentiate hardware profiles, using portable libraries like 
> OpenCL, etc).
> Working group email list: https://groups.google.com/forum/#!forum/mesos-gpus



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3368) Add device support in cgroups abstraction

2016-02-22 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues reassigned MESOS-3368:
--

Assignee: Kevin Klues

> Add device support in cgroups abstraction
> -
>
> Key: MESOS-3368
> URL: https://issues.apache.org/jira/browse/MESOS-3368
> Project: Mesos
>  Issue Type: Task
>Reporter: Niklas Quarfot Nielsen
>Assignee: Kevin Klues
>
> Add support for [device 
> cgroups|https://www.kernel.org/doc/Documentation/cgroup-v1/devices.txt] to 
> aid isolators controlling access to devices.
> In the future, we could think about how to numerate and control access to 
> devices as resource or task/container policy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4595) Add support for newest pre-defined Perf events to PerfEventIsolator

2016-02-22 Thread Bartek Plotka (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bartek Plotka updated MESOS-4595:
-
Description: 
Currently, Perf Event Isolator is able to monitor all (specified in 
{{--perf_events=...}}) Perf Events, but it can map only part of them in 
{{ResourceUsage.proto}} (to be more exact in [PerfStatistics.proto | 
https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L862])

Since the last time {{PerfStatistics.proto}} was updated, list of supported 
events expanded much and is growing constantly. I have created some comparison 
table:

|| Events type || Num of matched events in PerfStatistics vs perf 4.3.3 || perf 
4.3.3 events ||
| HW events  | 8  | 8  |
| SW events | 9 | 10 |
| HW cache event | 20 | 20 |
| *Kernel PMU events* | *0* | *37* |
| Tracepoint events | 0 | billion (: |

For advance analysis (e.g during Oversubscription in QoS Controller) having 
support for additional events is crucial. For instance in 
[Serenity|https://github.com/mesosphere/serenity] we based some of our 
revocation algorithms on the new [CMT| 
https://01.org/packet-processing/cache-monitoring-technology-memory-bandwidth-monitoring-cache-allocation-technology-code-and-data]
 feature which gives additional, useful event called {{llc_occupancy}}.

I think we all agree that it would be great to support more (or even all) perf 
events in {{Mesos PerfEventIsolator}} (:

Let's start a discussion over the approach. Within this task we have three 
issues:
# What events do we want to support in Mesos?
## all?
## only add Kernel PMU Events?
---
I don't have a strong opinion on that, since i have never used {{Tracepoint 
events}}. We currently need PMU events.
# How to add new (or modify existing) events in {{mesos.proto}}?
We can distinguish here 3 approaches:
*# Add new events statically in {{PerfStatistics.proto}} as separate optional 
fields. (like it is currently)
*# Instead of optional fields in {{PerfStatistics.proto}} message we could have 
a {{key-value}} map (something like {{labels}} in other messages) and feed it 
dynamically in {{PerfEventIsolator}}
*# We could mix above approaches and just add mentioned map to existing 
{{PerfStatistics.proto}} for additional events (:
---
IMO: Approaches 1) is somehow explicit - users can view what events to expect 
(although they are parsed in a different manner e.g {{"-"}} to {{"_"}}), but we 
would end with a looong message and a lot of copy-paste work. And we have to 
maintain that!
Approach 2 & 3 are more elastic, and we don't have problem mentioned in the 
issue below (: And we *always* support *all* perf events in all kernel versions 
(:
IMO approaches 2 & 3 are the best.
# How to support different naming format? For instance 
{{intel_cqm/llc_occupancy/}} with {{"/"}} in name or  
{{migrate:mm_migrate_pages}} with {{":"}}. I don't think it is possible to have 
these as the field names in {{.proto}} syntax

Currently, approach #3 is chosen. (Adding dynamic map to existing 
{{PerfStatistics.proto}} for additional events specified in 
{{--perf_events=...}})


  was:
Currently, Perf Event Isolator is able to monitor all (specified in 
{{--perf_events=...}}) Perf Events, but it can map only part of them in 
{{ResourceUsage.proto}} (to be more exact in [PerfStatistics.proto | 
https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L862])

Since the last time {{PerfStatistics.proto}} was updated, list of supported 
events expanded much and is growing constantly. I have created some comparison 
table:

|| Events type || Num of matched events in PerfStatistics vs perf 4.3.3 || perf 
4.3.3 events ||
| HW events  | 8  | 8  |
| SW events | 9 | 10 |
| HW cache event | 20 | 20 |
| *Kernel PMU events* | *0* | *37* |
| Tracepoint events | 0 | billion (: |

For advance analysis (e.g during Oversubscription in QoS Controller) having 
support for additional events is crucial. For instance in 
[Serenity|https://github.com/mesosphere/serenity] we based some of our 
revocation algorithms on the new [CMT| 
https://01.org/packet-processing/cache-monitoring-technology-memory-bandwidth-monitoring-cache-allocation-technology-code-and-data]
 feature which gives additional, useful event called {{llc_occupancy}}.

I think we all agree that it would be great to support more (or even all) perf 
events in {{Mesos PerfEventIsolator}} (:

Let's start a discussion over the approach. Within this task we have three 
issues:
# What events do we want to support in Mesos?
## all?
## only add Kernel PMU Events?
---
I don't have a strong opinion on that, since i have never used {{Tracepoint 
events}}. We currently need PMU events.
# How to add new (or modify existing) events in {{mesos.proto}}?
We can distinguish here 3 approaches:
*# Add new events statically in {{PerfStatistics.proto}} as separate optional 
fields. (like it is currently)
*# Instead of optional fields in {{Pe

[jira] [Commented] (MESOS-4269) Minor typo in src/linux/cgroups.cpp

2016-02-22 Thread Ryuichi Okumura (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157038#comment-15157038
 ] 

Ryuichi Okumura commented on MESOS-4269:


Any taker?

> Minor typo in src/linux/cgroups.cpp
> ---
>
> Key: MESOS-4269
> URL: https://issues.apache.org/jira/browse/MESOS-4269
> Project: Mesos
>  Issue Type: Bug
>Reporter: Ryuichi Okumura
>Assignee: Ryuichi Okumura
>Priority: Minor
>
> There is a typo in the "src/linux/cgroups.cpp" as follows.
> https://github.com/apache/mesos/blob/765c025dd43e04360b29c19bd9a66837954c5a20/src/linux/cgroups.cpp#L1438



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4547) Introduce TASK_KILLING state.

2016-02-22 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156825#comment-15156825
 ] 

Guangya Liu commented on MESOS-4547:


A new file was added for the command executor related to TASK_KILLING but no RR 
for this: 
https://github.com/apache/mesos/blob/master/src/tests/command_executor_tests.cpp

> Introduce TASK_KILLING state.
> -
>
> Key: MESOS-4547
> URL: https://issues.apache.org/jira/browse/MESOS-4547
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>Assignee: Abhishek Dasgupta
>  Labels: mesosphere
> Fix For: 0.28.0
>
>
> Currently there is no state to express that a task is being killed, but is 
> not yet killed (see MESOS-4140). In a similar way to how we have 
> TASK_STARTING to indicate the task is starting but not yet running, a 
> TASK_KILLING state would indicate the task is being killed but is not yet 
> killed.
> This would need to be guarded by a framework capability to protect old 
> frameworks that cannot understand the TASK_KILLING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4735) CommandInfo.URI should allow specifying target filename

2016-02-22 Thread Guangya Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangya Liu reassigned MESOS-4735:
--

Assignee: Guangya Liu

> CommandInfo.URI should allow specifying target filename
> ---
>
> Key: MESOS-4735
> URL: https://issues.apache.org/jira/browse/MESOS-4735
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 0.27.0
>Reporter: Erik Weathers
>Assignee: Guangya Liu
>Priority: Minor
>
> The {{CommandInfo.URI}} message should allow explicitly choosing the 
> downloaded file's name, to better mimic functionality present in tools like 
> {{wget}} and {{curl}}.
> This relates to issues when the {{CommandInfo.URI}} is pointing to a URL that 
> has query parameters at the end of the path, resulting in the downloaded 
> filename having those elements.  This also prevents extracting of such files, 
> since the extraction logic is simply looking at the file's suffix. See 
> MESOS-3367, MESOS-1686, and MESOS-1509 for more info.  If this issue was 
> fixed, then I could workaround the other issues not being fixed by modifying 
> my framework's scheduler to set the target filename.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4735) CommandInfo.URI should allow specifying target filename

2016-02-22 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156822#comment-15156822
 ] 

Guangya Liu commented on MESOS-4735:


[~erikdw] Can you please show more detail for your desired URI with an example? 
I was a bit confused, if only file name, how can the fetcher get the file?

> CommandInfo.URI should allow specifying target filename
> ---
>
> Key: MESOS-4735
> URL: https://issues.apache.org/jira/browse/MESOS-4735
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 0.27.0
>Reporter: Erik Weathers
>Priority: Minor
>
> The {{CommandInfo.URI}} message should allow explicitly choosing the 
> downloaded file's name, to better mimic functionality present in tools like 
> {{wget}} and {{curl}}.
> This relates to issues when the {{CommandInfo.URI}} is pointing to a URL that 
> has query parameters at the end of the path, resulting in the downloaded 
> filename having those elements.  This also prevents extracting of such files, 
> since the extraction logic is simply looking at the file's suffix. See 
> MESOS-3367, MESOS-1686, and MESOS-1509 for more info.  If this issue was 
> fixed, then I could workaround the other issues not being fixed by modifying 
> my framework's scheduler to set the target filename.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.

2016-02-22 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3937:
--
Assignee: (was: Till Toenshoff)

> Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
> ---
>
> Key: MESOS-3937
> URL: https://issues.apache.org/jira/browse/MESOS-3937
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
> Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2
> 8 CPUs, 16 GB memory
> Vagrant, libvirt/Virtual Box or VMware
>Reporter: Bernd Mathiske
>  Labels: mesosphere
> Fix For: 0.26.0
>
>
> {noformat}
> ../configure
> make check
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from DockerContainerizerTest
> I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms
> I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms
> I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns
> I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 
> 4927ns
> I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1605ns
> I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery
> I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status
> I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (4)@10.0.2.15:50088
> I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to 
> STARTING
> I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.016098ms
> I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to 
> STARTING
> I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status
> I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (5)@10.0.2.15:50088
> I1117 15:08:09.282552 26400 master.cpp:367] Master 
> 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 
> 10.0.2.15:50088
> I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/40AlT8/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" 
> --zk_session_timeout="10secs"
> I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/40AlT8/credentials'
> I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING
> I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.075466ms
> I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to 
> VOTING
> I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos 
> group
> I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated
> I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL
> I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled
> I1117 15:08:09.296018 26399 master.cpp:1606] The newly elected leader is 
> master@10.0.2.15:50088 with id 59c600f1-92ff-4926-9c84-073d9b81f68a
> I1117 15:08:09.296115 26399 master.cpp:1619] Elected as the leading master!
> I1117 15:08:09.2

[jira] [Commented] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.

2016-02-22 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156763#comment-15156763
 ] 

Till Toenshoff commented on MESOS-3937:
---

Reopening this ticket for suggesting the already mentioned "medium-term" 
solution as also hinted by the last comment on 
https://reviews.apache.org/r/40748/.

We might want to try to identify the lack of a resolvable hostname before 
attempting to run the above tests. Doing so within the test environment 
analysis allows us to exclude this/these test/s while also making the user 
aware of this configuration problem.

> Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
> ---
>
> Key: MESOS-3937
> URL: https://issues.apache.org/jira/browse/MESOS-3937
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
> Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2
> 8 CPUs, 16 GB memory
> Vagrant, libvirt/Virtual Box or VMware
>Reporter: Bernd Mathiske
>Assignee: Till Toenshoff
>  Labels: mesosphere
> Fix For: 0.26.0
>
>
> {noformat}
> ../configure
> make check
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from DockerContainerizerTest
> I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms
> I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms
> I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns
> I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 
> 4927ns
> I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1605ns
> I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery
> I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status
> I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (4)@10.0.2.15:50088
> I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to 
> STARTING
> I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.016098ms
> I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to 
> STARTING
> I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status
> I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (5)@10.0.2.15:50088
> I1117 15:08:09.282552 26400 master.cpp:367] Master 
> 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 
> 10.0.2.15:50088
> I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/40AlT8/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" 
> --zk_session_timeout="10secs"
> I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/40AlT8/credentials'
> I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING
> I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.075466ms
> I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to 
> VOTING
> I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the 

[jira] [Created] (MESOS-4735) CommandInfo.URI should allow specifying target filename

2016-02-22 Thread Erik Weathers (JIRA)
Erik Weathers created MESOS-4735:


 Summary: CommandInfo.URI should allow specifying target filename
 Key: MESOS-4735
 URL: https://issues.apache.org/jira/browse/MESOS-4735
 Project: Mesos
  Issue Type: Improvement
  Components: fetcher
Affects Versions: 0.27.0
Reporter: Erik Weathers
Priority: Minor


The {{CommandInfo.URI}} message should allow explicitly choosing the downloaded 
file's name, to better mimic functionality present in tools like {{wget}} and 
{{curl}}.

This relates to issues when the {{CommandInfo.URI}} is pointing to a URL that 
has query parameters at the end of the path, resulting in the downloaded 
filename having those elements.  This also prevents extracting of such files, 
since the extraction logic is simply looking at the file's suffix. See 
MESOS-3367, MESOS-1686, and MESOS-1509 for more info.  If this issue was fixed, 
then I could workaround the other issues not being fixed by modifying my 
framework's scheduler to set the target filename.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4547) Introduce TASK_KILLING state.

2016-02-22 Thread Abhishek Dasgupta (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156737#comment-15156737
 ] 

Abhishek Dasgupta commented on MESOS-4547:
--

Yes, there are test cases and docs for this feature:
For docs:
https://reviews.apache.org/r/43827/
https://reviews.apache.org/r/43821/

[~bmahler] Could you please provide the RRs for testcases here, if any??

> Introduce TASK_KILLING state.
> -
>
> Key: MESOS-4547
> URL: https://issues.apache.org/jira/browse/MESOS-4547
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>Assignee: Abhishek Dasgupta
>  Labels: mesosphere
> Fix For: 0.28.0
>
>
> Currently there is no state to express that a task is being killed, but is 
> not yet killed (see MESOS-4140). In a similar way to how we have 
> TASK_STARTING to indicate the task is starting but not yet running, a 
> TASK_KILLING state would indicate the task is being killed but is not yet 
> killed.
> This would need to be guarded by a framework capability to protect old 
> frameworks that cannot understand the TASK_KILLING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4547) Introduce TASK_KILLING state.

2016-02-22 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156719#comment-15156719
 ] 

Bernd Mathiske commented on MESOS-4547:
---

The RR for tests (https://reviews.apache.org/r/43490/) has been discarded. Are 
there going to be tests and documentation for this feature?

> Introduce TASK_KILLING state.
> -
>
> Key: MESOS-4547
> URL: https://issues.apache.org/jira/browse/MESOS-4547
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>Assignee: Abhishek Dasgupta
>  Labels: mesosphere
> Fix For: 0.28.0
>
>
> Currently there is no state to express that a task is being killed, but is 
> not yet killed (see MESOS-4140). In a similar way to how we have 
> TASK_STARTING to indicate the task is starting but not yet running, a 
> TASK_KILLING state would indicate the task is being killed but is not yet 
> killed.
> This would need to be guarded by a framework capability to protect old 
> frameworks that cannot understand the TASK_KILLING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4734) Add running cluster upgrade section for 0.25 => 0.26 and 0.26 => 0.27

2016-02-22 Thread Michael Park (JIRA)
Michael Park created MESOS-4734:
---

 Summary: Add running cluster upgrade section for 0.25 => 0.26 and 
0.26 => 0.27
 Key: MESOS-4734
 URL: https://issues.apache.org/jira/browse/MESOS-4734
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Michael Park


In {{docs/upgrades.md}}, the 0.25 to 0.26 and 0.26 to 0.27 is missing the "In 
order to upgrade a running cluster" steps. We should add these sections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)