[jira] [Commented] (MESOS-6656) Nested containers can become unkillable

2017-08-03 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113839#comment-16113839
 ] 

Benjamin Mahler commented on MESOS-6656:


Bug filed here: MESOS-7858

> Nested containers can become unkillable
> ---
>
> Key: MESOS-6656
> URL: https://issues.apache.org/jira/browse/MESOS-6656
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization
>Reporter: Greg Mann
>  Labels: nested
>
> An incident occurred recently in a cluster running a build of Mesos based on 
> commit {{757319357471227c0a1e906076eae8f9aa2fdbd6}} from master. A task group 
> of five tasks was launched via Marathon. After the tasks were launched, one 
> of the containers quickly exited and was successfully destroyed. A couple 
> minutes later, the task group was killed manually via Marathon, and the agent 
> can then be seen repeatedly attempting to kill the tasks for hours. No calls 
> to {{WAIT_NESTED_CONTAINER}} are visible in the agent logs, and the executor 
> logs do not indicate at any point that the nested containers were launched 
> successfully.
> Agent logs:
> {code}
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.890911  
> 6406 slave.cpp:1539] Got assigned task group containing tasks [ 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server1, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server2, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server3, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server4, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server5 ] for 
> framework ce4bd8be-1198-4819-81d4-9a8439439741-
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.892299  
> 6406 gc.cpp:83] Unscheduling 
> '/var/lib/mesos/slave/slaves/ce4bd8be-1198-4819-81d4-9a8439439741-S1/frameworks/ce4bd8be-1198-4819-81d4-9a8439439741-'
>  from gc
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.892379  
> 6406 gc.cpp:83] Unscheduling 
> '/var/lib/mesos/slave/meta/slaves/ce4bd8be-1198-4819-81d4-9a8439439741-S1/frameworks/ce4bd8be-1198-4819-81d4-9a8439439741-'
>  from gc
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.893131  
> 6405 slave.cpp:1701] Launching task group containing tasks [ 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server1, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server2, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server3, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server4, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server5 ] for 
> framework ce4bd8be-1198-4819-81d4-9a8439439741-
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.893435  
> 6405 paths.cpp:536] Trying to chown 
> '/var/lib/mesos/slave/slaves/ce4bd8be-1198-4819-81d4-9a8439439741-S1/frameworks/ce4bd8be-1198-4819-81d4-9a8439439741-/executors/instance-dat_scout.e57be1fe-b5e8-11e6-995b-70b3d581/runs/8750c2a7-8bef-4a69-8ef2-b873f884bf91'
>  to user 'root'
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.898026  
> 6405 slave.cpp:6179] Launching executor 
> 'instance-dat_scout.e57be1fe-b5e8-11e6-995b-70b3d581' of framework 
> ce4bd8be-1198-4819-81d4-9a8439439741- with resources cpus(*):0.1; 
> mem(*):32; disk(*):10; ports(*):[21421-21425] in work directory 
> '/var/lib/mesos/slave/slaves/ce4bd8be-1198-4819-81d4-9a8439439741-S1/frameworks/ce4bd8be-1198-4819-81d4-9a8439439741-/executors/instance-dat_scout.e57be1fe-b5e8-11e6-995b-70b3d581/runs/8750c2a7-8bef-4a69-8ef2-b873f884bf91'
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.898731  
> 6407 docker.cpp:1000] Skipping non-docker container
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.899050  
> 6407 containerizer.cpp:938] Starting container 
> 8750c2a7-8bef-4a69-8ef2-b873f884bf91 for executor 
> 'instance-dat_scout.e57be1fe-b5e8-11e6-995b-70b3d581' of framework 
> ce4bd8be-1198-4819-81d4-9a8439439741-
> Nov 29 04:04:16 ip-10-190-112-199 mesos-agent[6397]: I1129 04:04:16.899909  
> 6405 slave.cpp:1987] Queued task group containing tasks [ 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server1, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server2, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server3, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server4, 
> dat_scout.instance-e57be1fe-b5e8-11e6-995b-70b3d581.scout-server5 ] for 
> executor 'instance-dat_scout.e57be1fe-b5e8-11e6-995b-70b3d581' of 
> framework ce4bd8be-1198-4819-81d4-9a8439439741-
> Nov 29 04:04:16 ip-10-190-112-199 

[jira] [Created] (MESOS-7858) Launching a nested container with namespace/pid isolation, with glibc < 2.25, may deadlock the LinuxLauncher and MesosContainerizer

2017-08-03 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-7858:


 Summary: Launching a nested container with namespace/pid 
isolation, with glibc < 2.25, may deadlock the LinuxLauncher and 
MesosContainerizer
 Key: MESOS-7858
 URL: https://issues.apache.org/jira/browse/MESOS-7858
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Affects Versions: 1.3.0
Reporter: Joseph Wu


This bug in glibc (fixed in glibc 2.25) will sometimes cause a child process of 
a {{fork}} to {{assert}} incorrectly, if the parent enters a new pid namespace 
before forking: 
https://sourceware.org/bugzilla/show_bug.cgi?id=15392
https://sourceware.org/bugzilla/show_bug.cgi?id=21386

The LinuxLauncher code happens to do this when launching nested containers:
* The MesosContainerizer process launches a subprocess, with a customized 
{{ns::clone}} function as an argument.  The thread then basically waits for the 
launch to succeed and return a child PID: 
https://github.com/apache/mesos/blob/1.3.x/src/slave/containerizer/mesos/linux_launcher.cpp#L495
* A separate thread in the Mesos agent forks and then waits for the grandchild 
to report a PID: 
https://github.com/apache/mesos/blob/1.3.x/src/linux/ns.hpp#L453
* The child of the fork first enters the namespaces (including a pid namespace) 
and then forks a grandchild.  The child then calls {{waitpid}} on the 
grandchild: https://github.com/apache/mesos/blob/1.3.x/src/linux/ns.hpp#L555
* Due to the glibc bug, the grandchild sometimes never returns from the 
{{fork}} here: https://github.com/apache/mesos/blob/1.3.x/src/linux/ns.hpp#L540

According to the glibc bug, we can work around this by:
{quote}
The obvious solution is just to use clone() after setns() and never use fork() 
- and one can certainly patch both programs to do so. Nevertheless it would be 
nice to see if fork() also worked after setns(), especially since there is no 
inherent reason for it not to.
{quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7853) Support shared PID namespace.

2017-08-03 Thread Qian Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Zhang updated MESOS-7853:
--
Sprint: Mesosphere Sprint 61

> Support shared PID namespace.
> -
>
> Key: MESOS-7853
> URL: https://issues.apache.org/jira/browse/MESOS-7853
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Qian Zhang
>  Labels: containerizer, mesosphere, namespaces
>
> Currently, with the 'namespaces/pid' isolator enabled, each container will 
> have its own pid namespace. This does not meet the need for some scenarios. 
> For example, under the same executor container, one task wants to reach out 
> to another task which need to share the same pid namespace.
> We should support container pid namespace to be configurable. Users can 
> choose one container to share its parent's pid namespace or not.
> User facing API:
> {noformat}
> message LinuxInfo {
>   ..
>   // True if it shares the pid namepace with its parent. If the
>   // container is a top level container, it means share the pid
>   // namespace with the agent. If the container is a nested
>   // container, it means share the pid namespce with its parent
>   // container. This field will be ignored if 'namespaces/pid'
>   // isolator is not enabled.
>   optional bool share_pid_namespace = 4;
> }
> {noformat}
> A new agent flag:
> --disallow_top_level_pid_ns_sharing (defaults to be: false)
> this is a security concern from operator's perspective. While some of the 
> nested containers share the pid namespace from their parents, the top level 
> containers always not share the pid ns from the agent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7814) Improve the test frameworks.

2017-08-03 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7814:
--
Sprint:   (was: Mesosphere Sprint 61)

> Improve the test frameworks.
> 
>
> Key: MESOS-7814
> URL: https://issues.apache.org/jira/browse/MESOS-7814
> Project: Mesos
>  Issue Type: Improvement
>  Components: framework
>Reporter: Armand Grillet
>Assignee: Armand Grillet
>Priority: Minor
>  Labels: mesosphere, newbie
>
> These improvements include three main points:
> * Adding a {{name}} flag to certain frameworks to distinguish between 
> instances.
> * Cleaning up the code style of the frameworks.
> * For frameworks with custom executors, such as balloon framework, adding a 
> {{executor_extra_uris}} flag containing URIs that will be passed to the 
> {{command_info}} of the executor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7840) Add Mesos CLI command to list active tasks

2017-08-03 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7840:
--
Sprint:   (was: Mesosphere Sprint 61)

> Add Mesos CLI command to list active tasks
> --
>
> Key: MESOS-7840
> URL: https://issues.apache.org/jira/browse/MESOS-7840
> Project: Mesos
>  Issue Type: Improvement
>  Components: cli
>Reporter: Armand Grillet
>Assignee: Armand Grillet
>
> We need to add a command to list all the tasks running in a Mesos cluster by 
> checking the endpoint {{/tasks}} and reporting the results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7148) Compare the performance of the replicated log after upgrade to leveldb 1.19

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7148:
--
Target Version/s:   (was: 1.4.0)

> Compare the performance of the replicated log after upgrade to leveldb 1.19
> ---
>
> Key: MESOS-7148
> URL: https://issues.apache.org/jira/browse/MESOS-7148
> Project: Mesos
>  Issue Type: Task
>Reporter: haosdent
>Assignee: Tomasz Janiszewski
>
> We need to use {{./mesos-log benchmark}} to do the benchmark test for 
> replicated log, or add a new benchmark test to automatical this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7278) Implement configuration reader/writer for the new CLI

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7278:
--
Target Version/s: 1.5.0  (was: 1.4.0)

> Implement configuration reader/writer for the new CLI
> -
>
> Key: MESOS-7278
> URL: https://issues.apache.org/jira/browse/MESOS-7278
> Project: Mesos
>  Issue Type: Task
>  Components: cli
>Affects Versions: 1.3.0
>Reporter: Eric Chung
>Assignee: Eric Chung
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7141) Support hook scripts to customize actions for container's lifecycle

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7141:
--
Target Version/s: 1.5.0  (was: 1.4.0)

> Support hook scripts to customize actions for container's lifecycle
> ---
>
> Key: MESOS-7141
> URL: https://issues.apache.org/jira/browse/MESOS-7141
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jason Lai
>Assignee: Jason Lai
>  Labels: containerizer, hooks
>
> Inspired by [hooks | 
> https://github.com/opencontainers/runtime-spec/blob/master/config.md#hooks] 
> in [OCI's runtime spec | https://github.com/opencontainers/runtime-spec], it 
> would be great to have scripts hooked into the lifecycle of containers.
> The OCI doc has specified 3 stages for hooking:
> * Prestart
> * Poststart
> * Poststop
> We can consider having the 3 stages to start with.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7147) Create a BENCHMARK test for replicated log

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7147:
--
Target Version/s:   (was: 1.4.0)

> Create a BENCHMARK test for replicated log
> --
>
> Key: MESOS-7147
> URL: https://issues.apache.org/jira/browse/MESOS-7147
> Project: Mesos
>  Issue Type: Task
>Reporter: haosdent
>Assignee: Tomasz Janiszewski
>
> Refer to the email in http://search-hadoop.com/m/Mesos/0Vlr6IhDnC10qXs31
> From Jie:
> {quote}
> We probably should add a BENCHMARK test to our test suite so that this can
> be automated. For instance, initialize a replicated log with a single
> replica, and perform a bunch of writes with random sizes.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7141) Support hook scripts to customize actions for container's lifecycle

2017-08-03 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113695#comment-16113695
 ] 

Anand Mazumdar commented on MESOS-7141:
---

Retargeting this to 1.5.0

> Support hook scripts to customize actions for container's lifecycle
> ---
>
> Key: MESOS-7141
> URL: https://issues.apache.org/jira/browse/MESOS-7141
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jason Lai
>Assignee: Jason Lai
>  Labels: containerizer, hooks
>
> Inspired by [hooks | 
> https://github.com/opencontainers/runtime-spec/blob/master/config.md#hooks] 
> in [OCI's runtime spec | https://github.com/opencontainers/runtime-spec], it 
> would be great to have scripts hooked into the lifecycle of containers.
> The OCI doc has specified 3 stages for hooking:
> * Prestart
> * Poststart
> * Poststop
> We can consider having the 3 stages to start with.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7473) Use "-dev" prerelease label for version during development

2017-08-03 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113694#comment-16113694
 ] 

Anand Mazumdar commented on MESOS-7473:
---

Retargeting it for 1.5

> Use "-dev" prerelease label for version during development
> --
>
> Key: MESOS-7473
> URL: https://issues.apache.org/jira/browse/MESOS-7473
> Project: Mesos
>  Issue Type: Task
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> Prior discussion:
> https://lists.apache.org/thread.html/6e291c504fd44b79e452744b80073cb33adc1be85c17e22bbca35a6c@%3Cdev.mesos.apache.org%3E
> https://lists.apache.org/thread.html/eb526c9295b3cf8e4efc7e0a7d2dacabb61ab5ed867a05e7d913d3fb@%3Cdev.mesos.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7473) Use "-dev" prerelease label for version during development

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7473:
--
Target Version/s: 1.5.0  (was: 1.4.0)

> Use "-dev" prerelease label for version during development
> --
>
> Key: MESOS-7473
> URL: https://issues.apache.org/jira/browse/MESOS-7473
> Project: Mesos
>  Issue Type: Task
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> Prior discussion:
> https://lists.apache.org/thread.html/6e291c504fd44b79e452744b80073cb33adc1be85c17e22bbca35a6c@%3Cdev.mesos.apache.org%3E
> https://lists.apache.org/thread.html/eb526c9295b3cf8e4efc7e0a7d2dacabb61ab5ed867a05e7d913d3fb@%3Cdev.mesos.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7807) Docker executor needs to return multiple IP addresses for the container

2017-08-03 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113693#comment-16113693
 ] 

Anand Mazumdar commented on MESOS-7807:
---

[~avinash.mesos] I am retargeting this for 1.5.0. Please feel free to change it 
back to 1.4 if you feel otherwise.

> Docker executor needs to return multiple IP addresses for the container
> ---
>
> Key: MESOS-7807
> URL: https://issues.apache.org/jira/browse/MESOS-7807
> Project: Mesos
>  Issue Type: Task
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: Mesosphere
>
> `Docker executor` currently returns only a single IP address for each docker 
> container. In a world where container has a v4 and v6 address the executor 
> needs to return all the addresses it sees for the container else we won't be 
> able to support dual-stack containers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7807) Docker executor needs to return multiple IP addresses for the container

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7807:
--
Target Version/s: 1.5.0  (was: 1.4.0)

> Docker executor needs to return multiple IP addresses for the container
> ---
>
> Key: MESOS-7807
> URL: https://issues.apache.org/jira/browse/MESOS-7807
> Project: Mesos
>  Issue Type: Task
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: Mesosphere
>
> `Docker executor` currently returns only a single IP address for each docker 
> container. In a world where container has a v4 and v6 address the executor 
> needs to return all the addresses it sees for the container else we won't be 
> able to support dual-stack containers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-5254) Add URI parsing function/library

2017-08-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B reassigned MESOS-5254:
-

Assignee: (was: Joseph Wu)

> Add URI parsing function/library
> 
>
> Key: MESOS-5254
> URL: https://issues.apache.org/jira/browse/MESOS-5254
> Project: Mesos
>  Issue Type: Task
>  Components: fetcher, libprocess
>Reporter: Joseph Wu
>  Labels: mesosphere
>
> The {{uri::Fetcher}} theoretically supports all URIs, per 
> [RFC3986|http://tools.ietf.org/html/rfc3986].  To do this, we need a 
> spec-compliant parser from string to URI.
> [uriparser|http://uriparser.sourceforge.net/] appears to fit the bill.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7303) Support Isolator capabilities.

2017-08-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7303:
--
Shepherd: Jie Yu

> Support Isolator capabilities.
> --
>
> Key: MESOS-7303
> URL: https://issues.apache.org/jira/browse/MESOS-7303
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jie Yu
>Assignee: Joseph Wu
>  Labels: mesosphere, storage
>
> Currently, isolators have one capability: whether it supports nesting or not. 
> To support launching containers that are not tied to Mesos tasks or executors 
> (standalone containers), we need to add another capability to the Isolator 
> interface so that we can avoid invoking those isolators that are not yet 
> support that when launching standalone containers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7303) Support Isolator capabilities.

2017-08-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7303:
--
Labels: mesosphere storage  (was: storage)

> Support Isolator capabilities.
> --
>
> Key: MESOS-7303
> URL: https://issues.apache.org/jira/browse/MESOS-7303
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jie Yu
>Assignee: Joseph Wu
>  Labels: mesosphere, storage
>
> Currently, isolators have one capability: whether it supports nesting or not. 
> To support launching containers that are not tied to Mesos tasks or executors 
> (standalone containers), we need to add another capability to the Isolator 
> interface so that we can avoid invoking those isolators that are not yet 
> support that when launching standalone containers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7801) Retry logic for unsuccessful `docker rm` during agent recovery

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7801:
--
Fix Version/s: (was: 1.4.0)

> Retry logic for unsuccessful `docker rm` during agent recovery
> --
>
> Key: MESOS-7801
> URL: https://issues.apache.org/jira/browse/MESOS-7801
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>
> In MESOS- we skip the failure when `docker rm` fails due to mount leakage 
> during agent recovery. In order not to leave residual docker containers in 
> the docker daemon, we could do a best-effort `docker rm` retry with an 
> exponential backoff since we cannot control when the leakage would be 
> terminated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7171) Mesos Containerizer Change Size of SHM

2017-08-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B reassigned MESOS-7171:
-

Assignee: (was: Joseph Wu)

> Mesos Containerizer Change Size of SHM
> --
>
> Key: MESOS-7171
> URL: https://issues.apache.org/jira/browse/MESOS-7171
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Miguel Bernadin
>Priority: Minor
>  Labels: mesosphere
>
> like the ability to adjust the size of the shared memory device just like 
> this can be performed on docker.
> For example: To be able to change this on docker you can specify how much 
> space you would like to allocate as a parameter in the app definition in 
> marathon.
> {code}
>   "parameters": [
> {
>   "key": "shm-size",
>   "value": "256mb"
> }
> {code}
> As you can see below, here is an example of a container running and how much 
> space is available on disk reflecting this change.
> Modified Parameter Container:
> {code}
> {
>   "id": "/ubuntu-withshm",
>   "cmd": "sleep 1000\n",
>   "cpus": 1,
>   "mem": 128,
>   "disk": 0,
>   "instances": 1,
>   "container": {
> "type": "DOCKER",
> "volumes": [],
> "docker": {
>   "image": "ubuntu",
>   "network": "HOST",
>   "privileged": false,
>   "parameters": [
> {
>   "key": "shm-size",
>   "value": "256mb"
> }
>   ],
>   "forcePullImage": false
> }
>   },
>   "portDefinitions": [
> {
>   "port": 10005,
>   "protocol": "tcp",
>   "labels": {}
> }
>   ]
> }
> {code}
> Modified Parameter Container:
> {code}
> core@ip-10-0-0-19 ~ $ docker exec -it a818cf2277a5 bash
> root@ip-10-0-0-19:/# df -h
> Filesystem  Size  Used Avail Use% Mounted on
> overlay  37G  2.0G   33G   6% /
> tmpfs   7.4G 0  7.4G   0% /dev
> tmpfs   7.4G 0  7.4G   0% /sys/fs/cgroup
> /dev/xvdb37G  2.0G   33G   6% /etc/hostname
> shm 256M 0  256M   0% /dev/shm
> {code}
> Standard Container:
> {code}
> {
>   "id": "/ubuntu-withoutshm",
>   "cmd": "sleep 1",
>   "cpus": 1,
>   "mem": 128,
>   "disk": 0,
>   "instances": 1,
>   "container": {
> "type": "DOCKER",
> "volumes": [],
> "docker": {
>   "image": "ubuntu",
>   "network": "HOST",
>   "privileged": false,
>   "parameters": [],
>   "forcePullImage": false
> }
>   },
>   "portDefinitions": [
> {
>   "port": 10006,
>   "protocol": "tcp",
>   "labels": {}
> }
>   ]
> }
> {code}
> Standard Container:
> {code}
> root@ip-10-0-0-19:/# exit
> exit
> core@ip-10-0-0-19 ~ $ docker exec -it c85433062e78 bash
> root@ip-10-0-0-19:/# df -h
> Filesystem  Size  Used Avail Use% Mounted on
> overlay  37G  2.0G   33G   6% /
> tmpfs   7.4G 0  7.4G   0% /dev
> tmpfs   7.4G 0  7.4G   0% /sys/fs/cgroup
> /dev/xvdb37G  2.0G   33G   6% /etc/hostname
> shm  64M 0   64M   0% /dev/shm
> {code}
> How can this be done on mesos containerizer?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-5886) FUTURE_DISPATCH may react on irrelevant dispatch.

2017-08-03 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113692#comment-16113692
 ] 

Anand Mazumdar commented on MESOS-5886:
---

[~abudnik] Can you find a shepherd for this? I am retargeting this for 1.5.0.

> FUTURE_DISPATCH may react on irrelevant dispatch.
> -
>
> Key: MESOS-5886
> URL: https://issues.apache.org/jira/browse/MESOS-5886
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.2, 1.2.1, 1.3.0, 1.4.0
>Reporter: Alexander Rukletsov
>Assignee: Andrei Budnik
>  Labels: mesosphere, tech-debt, tech-debt-test
>
> [{{FUTURE_DISPATCH}}|https://github.com/apache/mesos/blob/e8ebbe5fe4189ef7ab046da2276a6abee41deeb2/3rdparty/libprocess/include/process/gmock.hpp#L50]
>  uses 
> [{{DispatchMatcher}}|https://github.com/apache/mesos/blob/e8ebbe5fe4189ef7ab046da2276a6abee41deeb2/3rdparty/libprocess/include/process/gmock.hpp#L350]
>  to figure out whether a processed {{DispatchEvent}} is the same the user is 
> waiting for. However, comparing {{std::type_info}} of function pointers is 
> not enough: different class methods with same signatures will be matched. 
> Here is the test that proves this:
> {noformat}
> class DispatchProcess : public Process
> {
> public:
>   MOCK_METHOD0(func0, void());
>   MOCK_METHOD1(func1, bool(bool));
>   MOCK_METHOD1(func1_same_but_different, bool(bool));
>   MOCK_METHOD1(func2, Future(bool));
>   MOCK_METHOD1(func3, int(int));
>   MOCK_METHOD2(func4, Future(bool, int));
> };
> {noformat}
> {noformat}
> TEST(ProcessTest, DispatchMatch)
> {
>   DispatchProcess process;
>   PID pid = spawn();
>   Future future = FUTURE_DISPATCH(
>   pid,
>   ::func1_same_but_different);
>   EXPECT_CALL(process, func1(_))
> .WillOnce(ReturnArg<0>());
>   dispatch(pid, ::func1, true);
>   AWAIT_READY(future);
>   terminate(pid);
>   wait(pid);
> }
> {noformat}
> The test passes:
> {noformat}
> [ RUN  ] ProcessTest.DispatchMatch
> [   OK ] ProcessTest.DispatchMatch (1 ms)
> {noformat}
> This change was introduced in https://reviews.apache.org/r/28052/.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-5886) FUTURE_DISPATCH may react on irrelevant dispatch.

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-5886:
--
Target Version/s: 1.5.0  (was: 1.4.0)

> FUTURE_DISPATCH may react on irrelevant dispatch.
> -
>
> Key: MESOS-5886
> URL: https://issues.apache.org/jira/browse/MESOS-5886
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.2, 1.2.1, 1.3.0, 1.4.0
>Reporter: Alexander Rukletsov
>Assignee: Andrei Budnik
>  Labels: mesosphere, tech-debt, tech-debt-test
>
> [{{FUTURE_DISPATCH}}|https://github.com/apache/mesos/blob/e8ebbe5fe4189ef7ab046da2276a6abee41deeb2/3rdparty/libprocess/include/process/gmock.hpp#L50]
>  uses 
> [{{DispatchMatcher}}|https://github.com/apache/mesos/blob/e8ebbe5fe4189ef7ab046da2276a6abee41deeb2/3rdparty/libprocess/include/process/gmock.hpp#L350]
>  to figure out whether a processed {{DispatchEvent}} is the same the user is 
> waiting for. However, comparing {{std::type_info}} of function pointers is 
> not enough: different class methods with same signatures will be matched. 
> Here is the test that proves this:
> {noformat}
> class DispatchProcess : public Process
> {
> public:
>   MOCK_METHOD0(func0, void());
>   MOCK_METHOD1(func1, bool(bool));
>   MOCK_METHOD1(func1_same_but_different, bool(bool));
>   MOCK_METHOD1(func2, Future(bool));
>   MOCK_METHOD1(func3, int(int));
>   MOCK_METHOD2(func4, Future(bool, int));
> };
> {noformat}
> {noformat}
> TEST(ProcessTest, DispatchMatch)
> {
>   DispatchProcess process;
>   PID pid = spawn();
>   Future future = FUTURE_DISPATCH(
>   pid,
>   ::func1_same_but_different);
>   EXPECT_CALL(process, func1(_))
> .WillOnce(ReturnArg<0>());
>   dispatch(pid, ::func1, true);
>   AWAIT_READY(future);
>   terminate(pid);
>   wait(pid);
> }
> {noformat}
> The test passes:
> {noformat}
> [ RUN  ] ProcessTest.DispatchMatch
> [   OK ] ProcessTest.DispatchMatch (1 ms)
> {noformat}
> This change was introduced in https://reviews.apache.org/r/28052/.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6394) Improvements to partition-aware Mesos frameworks.

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6394:
--
Target Version/s: 1.5.0  (was: 1.4.0)

> Improvements to partition-aware Mesos frameworks.
> -
>
> Key: MESOS-6394
> URL: https://issues.apache.org/jira/browse/MESOS-6394
> Project: Mesos
>  Issue Type: Epic
>  Components: master
>Reporter: Alexander Rukletsov
>Assignee: Neil Conway
>  Labels: mesosphere
>
> This is a follow up epic to MESOS-5344 to capture further improvements and 
> changes that need to be made to the MVP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7563) Make the HTTP command executor the default implementation.

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7563:
--
Target Version/s: 1.5.0  (was: 1.4.0)

> Make the HTTP command executor the default implementation.
> --
>
> Key: MESOS-7563
> URL: https://issues.apache.org/jira/browse/MESOS-7563
> Project: Mesos
>  Issue Type: Epic
>Reporter: Anand Mazumdar
>
> This epic tracks the work needed to make HTTP command executors the default 
> i.e., enable the {{http_command_executor}} flag. Currently, all command 
> executors use the old executor driver implementation. With this flag being 
> always enabled, the command executors would use the v1 HTTP API.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7474) Mesos Fetcher Cache Doesn't Retry when Missed

2017-08-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7474:
--
Labels: mesosphere  (was: )

> Mesos Fetcher Cache Doesn't Retry when Missed
> -
>
> Key: MESOS-7474
> URL: https://issues.apache.org/jira/browse/MESOS-7474
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.2.0
>Reporter: Miguel Bernadin
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> Mesos Fetcher doesn't retry when a cache is missed. It needs to have the 
> ability to pull from source when it fails. 
> 421 15:52:53.022902 32751 fetcher.cpp:498] Fetcher Info: 
> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/","items":[\{"action":"RETRIEVE_FROM_CACHE","cache_filename":")","uri":\{"cache":true,"executable":false,"extract":true,"value":"https:\/\/\/"}}],"sandbox_directory":"\/var\/lib\/mesos\/slave\/slaves\/\/frameworks\\/executors\/name\/runs\/"}
>  
> I0421 15:52:53.024926 32751 fetcher.cpp:409] Fetching URI 
> '"https:\/\/\/" 
> I0421 15:52:53.024942 32751 fetcher.cpp:306] Fetching from cache 
> I0421 15:52:53.024958 32751 fetcher.cpp:84] Extracting with command: tar -C 
> "\/var\/lib\/mesos\/slave\/slaves\/\/frameworks\\/executors\/name\/runs\/'
>  -xf 
> '/tmp/mesos/fetch/slaves/f3feeab8-a2fe-4ac1-afeb-ec7bd4ce7b0d-S29/c1-docker-hub.tar.gz'
>  
> tar: /"https:\/\/\/": Cannot 
> open: No such file or directory 
> tar: Error is not recoverable: exiting now 
> Failed to fetch 
> '"https:\/\/\/"': Failed to 
> extract: command tar -C 
> '"\/var\/lib\/mesos\/slave\/slaves\/\/frameworks\\/executors\/name\/runs\/'
>  -xf '/tmp/mesos/fetch/slaves/"' exited with status: 512



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-5882) `os::cloexec` does not exist on Windows

2017-08-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B reassigned MESOS-5882:
-

Assignee: (was: Joseph Wu)

> `os::cloexec` does not exist on Windows
> ---
>
> Key: MESOS-5882
> URL: https://issues.apache.org/jira/browse/MESOS-5882
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alex Clemmer
>  Labels: mesosphere, stout
>
> `os::cloexec` does not work on Windows. It will never work at the OS level. 
> Because of this, there are likely many important and hard-to-detect bugs 
> hanging around the agent.
> This is extremely important to fix. Some possible solutions to investigate 
> (some of which are _extremely_ risky):
> * Abstract out file descriptors into a class, implement cloexec in that class 
> on Windows (since we can't rely on the OS to do it).
> * Refactor all the code that relies on `os::cloexec` to not rely on it.
> Of the two, the first seems less risky in the short term, because the cloexec 
> code only affects Windows. Depending on the semantics of the implementation 
> of the `FileDescriptor` class, it is possible that this is riskier to Windows 
> in the longer term, as the semantics of `cloexec` may have subtle difference 
> between Linux and Windows.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7428) Report exit code of tasks from default and command executors

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7428:
--
Target Version/s: 1.5.0  (was: 1.4.0)

> Report exit code of tasks from default and command executors
> 
>
> Key: MESOS-7428
> URL: https://issues.apache.org/jira/browse/MESOS-7428
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>
> Use case: some tasks should only be retried if the exit code matches certain 
> user requirement.
> Based on [~gilbert], we already checkpoint the exit code in containerizer 
> now, and we need to clarify how to report exit code for executor containers 
> v.s. nested containers, and we should do this consistently for command and 
> default executor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7428) Report exit code of tasks from default and command executors

2017-08-03 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113689#comment-16113689
 ] 

Anand Mazumdar commented on MESOS-7428:
---

Retargeting for 1.5.0

> Report exit code of tasks from default and command executors
> 
>
> Key: MESOS-7428
> URL: https://issues.apache.org/jira/browse/MESOS-7428
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>
> Use case: some tasks should only be retried if the exit code matches certain 
> user requirement.
> Based on [~gilbert], we already checkpoint the exit code in containerizer 
> now, and we need to clarify how to report exit code for executor containers 
> v.s. nested containers, and we should do this consistently for command and 
> default executor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7317) Add master endpoint to deactivate / activate agent

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7317:
--
Target Version/s: 1.5.0  (was: 1.4.0)

> Add master endpoint to deactivate / activate agent
> --
>
> Key: MESOS-7317
> URL: https://issues.apache.org/jira/browse/MESOS-7317
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, master
>Reporter: Neil Conway
>  Labels: mesosphere
>
> This would allow the operator to deactivate and then subsequently activate an 
> agent. The allocator does not make offers for deactivated agents; this 
> functionality would be useful to help operators "manually (incrementally) 
> drain" the tasks running on an agent, e.g., before taking the agent down.
> At present, if the operator causes a framework to kill a task running on the 
> agent, the framework will often receive an offer for the unused resources on 
> the agent, which will often result in respawning the killed task on the same 
> agent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7103) Container Attach/Exec Improvements

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7103:
--
Target Version/s: 1.5.0  (was: 1.4.0)

> Container Attach/Exec Improvements
> --
>
> Key: MESOS-7103
> URL: https://issues.apache.org/jira/browse/MESOS-7103
> Project: Mesos
>  Issue Type: Epic
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: tech-debt
>
> Most of the core changes required to add "container exec" and "container 
> attach" support to Mesos landed in the 1.2 release. However, some features 
> (such as actually integrating this support into the CLI) haven't quite landed 
> yet.
> This Epic aims to capture the tickets that still need to be resolved before 
> we can consider work on this feature complete. It is targeted for the 1.3 
> release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7033) Update documentation for hierarchical roles.

2017-08-03 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113687#comment-16113687
 ] 

Anand Mazumdar commented on MESOS-7033:
---

[~bmahler] [~neilc] Do we plan to work on it for the 1.4.0 release? I removed 
the target version for now.

> Update documentation for hierarchical roles.
> 
>
> Key: MESOS-7033
> URL: https://issues.apache.org/jira/browse/MESOS-7033
> Project: Mesos
>  Issue Type: Task
>  Components: documentation
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere, multitenancy
>
> A few things to be sure cover:
> * How to ensure that a volume is not shared with other frameworks. 
> Previously, this meant running only 1 framework in the role and using ACLs to 
> prevent other frameworks from running in the role. With hierarchical roles, 
> this now also includes using ACLs to prevent any child roles from being 
> created beneath the role (as these children would be able to obtain the 
> reserved resources). We've been advising frameworks to generate a role (e.g. 
> eng/kafka/) to ensure that they own their reservations (but the 
> dynamic nature of this makes setting up ACLs difficult). Longer term, we may 
> need a more explicit way to bind reservations or volumes to frameworks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7033) Update documentation for hierarchical roles.

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7033:
--
Target Version/s:   (was: 1.4.0)

> Update documentation for hierarchical roles.
> 
>
> Key: MESOS-7033
> URL: https://issues.apache.org/jira/browse/MESOS-7033
> Project: Mesos
>  Issue Type: Task
>  Components: documentation
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere, multitenancy
>
> A few things to be sure cover:
> * How to ensure that a volume is not shared with other frameworks. 
> Previously, this meant running only 1 framework in the role and using ACLs to 
> prevent other frameworks from running in the role. With hierarchical roles, 
> this now also includes using ACLs to prevent any child roles from being 
> created beneath the role (as these children would be able to obtain the 
> reserved resources). We've been advising frameworks to generate a role (e.g. 
> eng/kafka/) to ensure that they own their reservations (but the 
> dynamic nature of this makes setting up ACLs difficult). Longer term, we may 
> need a more explicit way to bind reservations or volumes to frameworks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6667) Update vendored ZooKeeper to 3.4.9

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6667:
--
Target Version/s:   (was: 1.4.0)

> Update vendored ZooKeeper to 3.4.9
> --
>
> Key: MESOS-6667
> URL: https://issues.apache.org/jira/browse/MESOS-6667
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>  Labels: mesosphere
>
> 3.4.9 has a few notable fixes for the C client library.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7404) Ensure hierarchical roles work with old Mesos agents

2017-08-03 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113682#comment-16113682
 ] 

Anand Mazumdar commented on MESOS-7404:
---

[~bmahler] [~neilc] Do we plan to work on it for the 1.4.0 release? If not, 
please retarget it for 1.5.0.

> Ensure hierarchical roles work with old Mesos agents
> 
>
> Key: MESOS-7404
> URL: https://issues.apache.org/jira/browse/MESOS-7404
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> If the Mesos master supports hierarchical roles but the agent does not, we 
> need to ensure that we avoid putting the agent into a bad state, e.g., if the 
> user creates a persistent volume.
> One approach is to use an agent capability for hierarchical roles, and 
> disallow creating persistent-volumes using a hierarchical role if the agent 
> doesn't have the capability. We could also use an agent version check, 
> although until MESOS-6975 is implemented, that will be a bit awkward.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7691) Support local enabled cgroups subsystems automatically.

2017-08-03 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113680#comment-16113680
 ] 

Anand Mazumdar commented on MESOS-7691:
---

[~gilbert] Do you plan to work on it for the 1.4.0 release. If not, please 
retarget it for 1.5.0.

> Support local enabled cgroups subsystems automatically.
> ---
>
> Key: MESOS-7691
> URL: https://issues.apache.org/jira/browse/MESOS-7691
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: cgroups
>
> Currently, each cgroup subsystem needs to be turned on as an isolator, e.g., 
> "cgroups/blkio". Ideally, mesos should be able to detect all local enabled 
> cgroup subsystems and turn them on automatically (or we call it auto cgroups).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7150) Support delegating quota to role subtrees

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7150:
--
Target Version/s:   (was: 1.4.0)

> Support delegating quota to role subtrees
> -
>
> Key: MESOS-7150
> URL: https://issues.apache.org/jira/browse/MESOS-7150
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Neil Conway
>  Labels: mesosphere
>
> If a quota is set on a role {{x}}, those resources should be offered to any 
> framework registered in role {{x}} _or any nested role in the subtree under 
> x_. For example, setting a quota on {{eng}} should result in resources that 
> are available for use by {{eng}}, {{eng/dev}} and {{eng/prod}}, even if the 
> latter two roles do not themselves have their quotas set.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-4794) Add documentation around using the docker containerizer on CentOS 6.

2017-08-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B reassigned MESOS-4794:
-

Assignee: (was: Joseph Wu)

> Add documentation around using the docker containerizer on CentOS 6.
> 
>
> Key: MESOS-4794
> URL: https://issues.apache.org/jira/browse/MESOS-4794
> Project: Mesos
>  Issue Type: Documentation
>  Components: docker, documentation
>Affects Versions: 0.28.0
>Reporter: Joseph Wu
>  Labels: containerizer, docker, documentation, mesosphere
>
> Support for persistent volumes was added to the docker containerizer in 
> [MESOS-3413].  However, this does not work on CentOS 6.
> On CentOS 6, the same {{docker run -v ...}} operation does not perform a 
> recursive bind, whereas on every other OS supported by Mesos, docker does a 
> recursive bind.
> Docker has already [dropped support for CentOS 
> 6|https://github.com/docker/docker/issues/14365], so we should add 
> precautionary documentation in case anyone tries to use the docker 
> containerizer on CentOS 6.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7564) Introduce a heartbeat mechanism for v1 HTTP executor <-> agent communication.

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7564:
--
Target Version/s:   (was: 1.4.0)

> Introduce a heartbeat mechanism for v1 HTTP executor <-> agent communication.
> -
>
> Key: MESOS-7564
> URL: https://issues.apache.org/jira/browse/MESOS-7564
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>
> Currently, we do not have heartbeats for executor <-> agent communication. 
> This is especially problematic in scenarios when IPFilters are enabled since 
> the default conntrack keep alive timeout is 5 days. When that timeout 
> elapses, the executor doesn't get notified via a socket disconnection when 
> the agent process restarts. The executor would then get killed if it doesn't 
> re-register when the agent recovery process is completed.
> Enabling application level heartbeats or TCP KeepAlive's can be a possible 
> way for fixing this issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7568) Introduce a heartbeat mechanism for v0 executor <-> agent links.

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7568:
--
Target Version/s:   (was: 1.4.0)

> Introduce a heartbeat mechanism for v0 executor <-> agent links.
> 
>
> Key: MESOS-7568
> URL: https://issues.apache.org/jira/browse/MESOS-7568
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>
> Currently, we do not have heartbeats for executor <-> agent communication. 
> This is especially problematic in scenarios when IPFilters are enabled since 
> the default conntrack keep alive timeout is 5 days. When that timeout 
> elapses, the executor doesn't get notified via a socket disconnection when 
> the agent process restarts. The executor would then get killed if it doesn't 
> re-register when the agent recovery process is completed.
> Enabling application level heartbeats or TCP KeepAlive's can be a possible 
> way for fixing this issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-5827) Add example framework for using inverse offers

2017-08-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5827:
--
Labels: mesosphere newbie  (was: newbie)

> Add example framework for using inverse offers
> --
>
> Key: MESOS-5827
> URL: https://issues.apache.org/jira/browse/MESOS-5827
> Project: Mesos
>  Issue Type: Task
>Reporter: Artem Harutyunyan
>Assignee: Joseph Wu
>Priority: Minor
>  Labels: mesosphere, newbie
>
> We should have an example framework (in src/examples) demonstrating how to 
> handle inverse offers. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-3449) Expand the range of integer precision in json <-> protobuf conversions to include unsigned integers

2017-08-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B reassigned MESOS-3449:
-

Assignee: (was: Joseph Wu)

> Expand the range of integer precision in json <-> protobuf conversions to 
> include unsigned integers
> ---
>
> Key: MESOS-3449
> URL: https://issues.apache.org/jira/browse/MESOS-3449
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Affects Versions: 0.25.0
>Reporter: Joseph Wu
>Priority: Minor
>  Labels: json, mesosphere, protobuf
>
> The previous changes (MESOS-3345) to support integer precision when 
> converting JSON <-> Protobuf did not support precision for unsigned integers 
> between {{INT64_MAX}} and {{UINT64_MAX}}.  (There's some loss, but the 
> conversion is still as good/bad as it was with doubles.)
> This problem is due to a limitation in the JSON parsing library we use 
> (PicoJSON), which parses integers as {{int64_t}}.
> Some possible solutions or things to investigate:
> * We can patch PicoJSON to parse some large values as {{uint64_t}}.
> * We can investigate using another parsing library.
> * If we want extra precision beyond 64 or 80 bits per double, one possibility 
> is the [GMP library|https://gmplib.org/].  We'd still need to change the 
> parsing library though.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6843) Fetcher should not assume stdout/stderr in the sandbox.

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6843:
--
Target Version/s: 1.5.0  (was: 1.4.0)

[~jieyu] Pushing this off to target 1.5.0. Please let me know if this is a 
blocker for 1.4.0.

> Fetcher should not assume stdout/stderr in the sandbox.
> ---
>
> Key: MESOS-6843
> URL: https://issues.apache.org/jira/browse/MESOS-6843
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.0.2, 1.1.0
>Reporter: Jie Yu
>Priority: Critical
>  Labels: mesosphere
>
> If container logger is used, this assumption might not be true. For instance, 
> a journald logger might redirect all task logs to journald. So in theory, the 
> fetcher log should go to journald as well, rather than writing to 
> sandbox/stdout and sandbox/stderr.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-5807) Support job_object in subprocess on Windows.

2017-08-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5807:
--
Shepherd: Joseph Wu

> Support job_object in subprocess on Windows.
> 
>
> Key: MESOS-5807
> URL: https://issues.apache.org/jira/browse/MESOS-5807
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>Assignee: Andrew Schwartzmeyer
>
> Currently, in command executor, we use different code path for posix and 
> windows:
> {noformat}
> #ifndef __WINDOWS__
> pid = launchTaskPosix(
> command,
> launcherDir,
> user,
> rootfs,
> sandboxDirectory,
> workingDirectory);
> #else
> // A Windows process is started using the `CREATE_SUSPENDED` flag
> // and is part of a job object. While the process handle is kept
> // open the reap function will work.
> PROCESS_INFORMATION processInformation = launchTaskWindows(
> command,
> rootfs);
> pid = processInformation.dwProcessId;
> ::ResumeThread(processInformation.hThread);
> CloseHandle(processInformation.hThread);
> processHandle = processInformation.hProcess;
> #endif
> {noformat}
> During a recent refactor (MESOS-5753), for the posix path, command executor 
> reused `mesos-containerizer launch` helper to launch user tasks.
> If we were to be able to support job_object in Subprocess, we can get rid of 
> this divergence in command executor. This also allow us to support custom 
> executors on Windows.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-5807) Support job_object in subprocess on Windows.

2017-08-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B reassigned MESOS-5807:
-

Assignee: Andrew Schwartzmeyer  (was: Joseph Wu)

Future work for [~andschwa], shepherded by [~kaysoky]

> Support job_object in subprocess on Windows.
> 
>
> Key: MESOS-5807
> URL: https://issues.apache.org/jira/browse/MESOS-5807
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>Assignee: Andrew Schwartzmeyer
>
> Currently, in command executor, we use different code path for posix and 
> windows:
> {noformat}
> #ifndef __WINDOWS__
> pid = launchTaskPosix(
> command,
> launcherDir,
> user,
> rootfs,
> sandboxDirectory,
> workingDirectory);
> #else
> // A Windows process is started using the `CREATE_SUSPENDED` flag
> // and is part of a job object. While the process handle is kept
> // open the reap function will work.
> PROCESS_INFORMATION processInformation = launchTaskWindows(
> command,
> rootfs);
> pid = processInformation.dwProcessId;
> ::ResumeThread(processInformation.hThread);
> CloseHandle(processInformation.hThread);
> processHandle = processInformation.hProcess;
> #endif
> {noformat}
> During a recent refactor (MESOS-5753), for the posix path, command executor 
> reused `mesos-containerizer launch` helper to launch user tasks.
> If we were to be able to support job_object in Subprocess, we can get rid of 
> this divergence in command executor. This also allow us to support custom 
> executors on Windows.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-4086) Containerizer logging modularization

2017-08-03 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113669#comment-16113669
 ] 

Adam B commented on MESOS-4086:
---

[~kaysoky] Let's see if we can close out this Epic and create a new one for 
further improvements.

> Containerizer logging modularization
> 
>
> Key: MESOS-4086
> URL: https://issues.apache.org/jira/browse/MESOS-4086
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization, modules
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: logging, mesosphere
>
> Executors and tasks are configured (via the various containerizers) to write 
> their output (stdout/stderr) to files ("stdout" and "stderr") on an agent's 
> disk.
> Unlike Master/Agent logs, executor/task logs are not attached to any formal 
> logging system, like {{glog}}.  As such, there is significant scope for 
> improvement.
> By introducing a module for logging, we can provide a common/programmatic way 
> to access and manage executor/task logs.  Modules could implement additional 
> sinks for logs, such as:
> * to the sandbox (the status quo),
> * to syslog,
> * to journald
> This would also provide the hooks to deal with logging related problems, such 
> as:
> * the (current) lack of log rotation,
> * searching through executor/task logs (i.e. via aggregation)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-3901) Enable Mesos to be able know when it is hosted behind a proxy with a URL prefix

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3901:
--
Target Version/s: 1.5.0  (was: 1.4.0)

> Enable Mesos to be able know when it is hosted behind a proxy with a URL 
> prefix
> ---
>
> Key: MESOS-3901
> URL: https://issues.apache.org/jira/browse/MESOS-3901
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Harpreet
>Assignee: haosdent
>Priority: Critical
>  Labels: mesosphere
>
> If Mesos is run behind a proxy with a URL prefix e.g.  
> https://:/services/mesos (`/services/mesos` being the URL 
> prefix), sandboxes in mesos don't load. This happens because when
>   Mesos is accessed through a proxy at 
> https://:/services/mesos, Mesos tries to request slave state 
> from 
> https://:/slave/20151110-232502-218431498-5050-1234-S1/slave(1)/state.json?jsonp=angular.callbacks._4.
>  This URL is missing the /services/mesos path prefix, so the request fails. 
> Fixing this by rewriting URLs in the body of every response, would not be a 
> clean solution and can be error prone.
> After searching around a bit we've learned that this is apparently a common 
> issue with webapps, because there is no standard specification for making 
> them aware of their base URL path. Some will allow you to specify a base path 
> in configuration[1], others will respect an X-Forwarded-Path header if a 
> proxy provides it[2], and others don't handle this at all. 
> It would be great to have explicit support in for this in Mesos.
> [1] 
> http://search.cpan.org/~bobtfish/Catalyst-TraitFor-Request-ProxyBase-0.05/lib/Catalyst/TraitFor/Request/ProxyBase.pm
> [2] https://github.com/mattkenney/feedsquish/blob/master/rupta.py#L94



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-3901) Enable Mesos to be able know when it is hosted behind a proxy with a URL prefix

2017-08-03 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113666#comment-16113666
 ] 

Anand Mazumdar commented on MESOS-3901:
---

[~haosd...@gmail.com] Are you still working on this? Pushing this off to target 
1.5.0. Please let me know if this is a blocker for 1.4.0.

> Enable Mesos to be able know when it is hosted behind a proxy with a URL 
> prefix
> ---
>
> Key: MESOS-3901
> URL: https://issues.apache.org/jira/browse/MESOS-3901
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Harpreet
>Assignee: haosdent
>Priority: Critical
>  Labels: mesosphere
>
> If Mesos is run behind a proxy with a URL prefix e.g.  
> https://:/services/mesos (`/services/mesos` being the URL 
> prefix), sandboxes in mesos don't load. This happens because when
>   Mesos is accessed through a proxy at 
> https://:/services/mesos, Mesos tries to request slave state 
> from 
> https://:/slave/20151110-232502-218431498-5050-1234-S1/slave(1)/state.json?jsonp=angular.callbacks._4.
>  This URL is missing the /services/mesos path prefix, so the request fails. 
> Fixing this by rewriting URLs in the body of every response, would not be a 
> clean solution and can be error prone.
> After searching around a bit we've learned that this is apparently a common 
> issue with webapps, because there is no standard specification for making 
> them aware of their base URL path. Some will allow you to specify a base path 
> in configuration[1], others will respect an X-Forwarded-Path header if a 
> proxy provides it[2], and others don't handle this at all. 
> It would be great to have explicit support in for this in Mesos.
> [1] 
> http://search.cpan.org/~bobtfish/Catalyst-TraitFor-Request-ProxyBase-0.05/lib/Catalyst/TraitFor/Request/ProxyBase.pm
> [2] https://github.com/mattkenney/feedsquish/blob/master/rupta.py#L94



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-5261) Combine the internal::slave::Fetcher class and mesos-fetcher binary

2017-08-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B reassigned MESOS-5261:
-

Assignee: (was: Joseph Wu)

> Combine the internal::slave::Fetcher class and mesos-fetcher binary
> ---
>
> Key: MESOS-5261
> URL: https://issues.apache.org/jira/browse/MESOS-5261
> Project: Mesos
>  Issue Type: Task
>  Components: fetcher
>Reporter: Joseph Wu
>  Labels: fetcher, mesosphere
>
> After [MESOS-5259], the {{mesos-fetcher}} will no longer need to be a 
> separate binary and can be safely folded back into the agent process.  (It 
> was a separate binary because libcurl has synchronous/blocking calls.)  
> This will likely mean:
> * A change to the {{fetch}} continuation chain:
>   
> https://github.com/apache/mesos/blob/653eca74f1080f5f55cd5092423506163e65d402/src/slave/containerizer/fetcher.cpp#L315
> * This protobuf can be deprecated (or just removed):
>   
> https://github.com/apache/mesos/blob/653eca74f1080f5f55cd5092423506163e65d402/include/mesos/fetcher/fetcher.proto



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-5260) Extend the uri::Fetcher::Plugin interface to include a "fetchSize"

2017-08-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B reassigned MESOS-5260:
-

Assignee: (was: Joseph Wu)

> Extend the uri::Fetcher::Plugin interface to include a "fetchSize"
> --
>
> Key: MESOS-5260
> URL: https://issues.apache.org/jira/browse/MESOS-5260
> Project: Mesos
>  Issue Type: Task
>  Components: fetcher
>Reporter: Joseph Wu
>  Labels: fetcher, mesosphere
>
> In order to replace the {{mesos-fetcher}} binary with the {{uri::Fetcher}}, 
> each plugin must be able to determine/estimate the size of a download.  This 
> is used by the Fetcher cache when it creates cache entries and such.
> The logic for each of the four {{Fetcher::Plugin}}s can be taken and 
> refactored from the existing fetcher.
> https://github.com/apache/mesos/blob/653eca74f1080f5f55cd5092423506163e65d402/src/slave/containerizer/fetcher.cpp#L267



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-5259) Refactor the mesos-fetcher binary to use the uri::Fetcher as a backend

2017-08-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B reassigned MESOS-5259:
-

Assignee: (was: Joseph Wu)

> Refactor the mesos-fetcher binary to use the uri::Fetcher as a backend
> --
>
> Key: MESOS-5259
> URL: https://issues.apache.org/jira/browse/MESOS-5259
> Project: Mesos
>  Issue Type: Task
>  Components: fetcher
>Reporter: Joseph Wu
>  Labels: fetcher, mesosphere
>
> This is an intermediate step for combining the {{mesos-fetcher}} binary and 
> {{uri::Fetcher}}.  
> The {{download}} method should be replaced with {{uri::Fetcher::fetch}}.
> https://github.com/apache/mesos/blob/653eca74f1080f5f55cd5092423506163e65d402/src/launcher/fetcher.cpp#L179
> Combining the two will:
> * Attach the {{uri::Fetcher}} to the existing Fetcher caching logic.
> * Remove some code duplication for downloading URIs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7123) Investigate splitting offer messages instead of sending a giant single resource offer message.

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7123:
--
Target Version/s:   (was: 1.4.0)

> Investigate splitting offer messages instead of sending a giant single 
> resource offer message.
> --
>
> Key: MESOS-7123
> URL: https://issues.apache.org/jira/browse/MESOS-7123
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, the Mesos master batches all the resource offers into a single 
> message and then sends it to the scheduler. However, for large clusters this 
> can be problematic as this message can exceed the maximum allowed default 
> protobuf message size (~64mb). When such a message reaches the scheduler, 
> it's dropped with a warning followed by a failed invariant check.
> {noformat}
> [libprotobuf ERROR google/protobuf/io/coded_stream.cc:180] A protocol message 
> was rejected because it was too big (more than 67108864 bytes).  To increase 
> the limit (or to disable these warnings), see 
> CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stre
> am.h.
> F0213 21:33:57.658892 60996 sched.cpp:895] Check failed: offers.size() == 
> pids.size() (32664 vs. 0)
> *** Check failure stack trace: ***
> @ 0x7f8d1b4d69bd  (unknown)
> @ 0x7f8d1b4d8750  (unknown)
> @ 0x7f8d1b4d6582  (unknown)
> @ 0x7f8d1b4d90e9  (unknown)
> @ 0x7f8d1aaa646c  (unknown)
> @ 0x7f8d1aaa7df7  (unknown)
> @ 0x7f8d1aa8ee4a  (unknown)
> @ 0x7f8d1aa9d109  (unknown)
> @ 0x7f8d1b46e4e4  (unknown)
> @ 0x7f8d1b46e827  (unknown)
> @ 0x7f8e319b0220  (unknown)
> @ 0x7f8e3355ddc5  start_thread
> @ 0x7f8e32c62ced  __clone
> @  (nil)  (unknown)
> {noformat}
> Possible solutions can be to either batch the offers e.g., 100 offers per 
> message or have a N:1 mapping ie., 1 offer per message by the Mesos master. 
> The batch size can be set via a master flag at startup with a reasonable 
> default value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-4872) Dump the contents of the sandbox when a test fails

2017-08-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B reassigned MESOS-4872:
-

Assignee: (was: Joseph Wu)

> Dump the contents of the sandbox when a test fails
> --
>
> Key: MESOS-4872
> URL: https://issues.apache.org/jira/browse/MESOS-4872
> Project: Mesos
>  Issue Type: Improvement
>  Components: test
>Reporter: Joseph Wu
>  Labels: mesosphere, newbie, test
>
> [~bernd-mesos] added this logic for extra info about a rare flaky test:
> https://github.com/apache/mesos/blob/d26baee1f377aedb148ad04cc004bb38b85ee4f6/src/tests/fetcher_cache_tests.cpp#L249-L259
> This information is useful regardless of the test type and should be 
> generalized for {{cluster::Slave}}.  i.e. 
> # When a {{cluster::Slave}} is destructed, it can detect if the test has 
> failed.  
> # If so, navigate through its own {{work_dir}} and print sandboxes and/or 
> other useful debugging info.
> Also see the refactor in [MESOS-4634].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6784) IOSwitchboardTest.KillSwitchboardContainerDestroyed is flaky

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6784:
--
Target Version/s: 1.5.0  (was: 1.4.0)

> IOSwitchboardTest.KillSwitchboardContainerDestroyed is flaky
> 
>
> Key: MESOS-6784
> URL: https://issues.apache.org/jira/browse/MESOS-6784
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Neil Conway
>Priority: Critical
>  Labels: mesosphere
>
> {noformat}
> [ RUN  ] IOSwitchboardTest.KillSwitchboardContainerDestroyed
> I1212 13:57:02.641043  2211 containerizer.cpp:220] Using isolation: 
> posix/cpu,filesystem/posix,network/cni
> W1212 13:57:02.641438  2211 backend.cpp:76] Failed to create 'overlay' 
> backend: OverlayBackend requires root privileges, but is running as user nrc
> W1212 13:57:02.641559  2211 backend.cpp:76] Failed to create 'bind' backend: 
> BindBackend requires root privileges
> I1212 13:57:02.642822  2268 containerizer.cpp:594] Recovering containerizer
> I1212 13:57:02.643975  2253 provisioner.cpp:253] Provisioner recovery complete
> I1212 13:57:02.644953  2255 containerizer.cpp:986] Starting container 
> 09e87380-00ab-4987-83c9-fa1c5d86717f for executor 'executor' of framework
> I1212 13:57:02.647004  2245 switchboard.cpp:430] Allocated pseudo terminal 
> '/dev/pts/54' for container 09e87380-00ab-4987-83c9-fa1c5d86717f
> I1212 13:57:02.652305  2245 switchboard.cpp:596] Created I/O switchboard 
> server (pid: 2705) listening on socket file 
> '/tmp/mesos-io-switchboard-b4af1c92-6633-44f3-9d35-e0e36edaf70a' for 
> container 09e87380-00ab-4987-83c9-fa1c5d86717f
> I1212 13:57:02.655513  2267 launcher.cpp:133] Forked child with pid '2706' 
> for container '09e87380-00ab-4987-83c9-fa1c5d86717f'
> I1212 13:57:02.655732  2267 containerizer.cpp:1621] Checkpointing container's 
> forked pid 2706 to 
> '/tmp/IOSwitchboardTest_KillSwitchboardContainerDestroyed_Me5CRx/meta/slaves/frameworks/executors/executor/runs/09e87380-00ab-4987-83c9-fa1c5d86717f/pids/forked.pid'
> I1212 13:57:02.726306  2265 containerizer.cpp:2463] Container 
> 09e87380-00ab-4987-83c9-fa1c5d86717f has exited
> I1212 13:57:02.726352  2265 containerizer.cpp:2100] Destroying container 
> 09e87380-00ab-4987-83c9-fa1c5d86717f in RUNNING state
> E1212 13:57:02.726495  2243 switchboard.cpp:861] Unexpected termination of 
> I/O switchboard server: 'IOSwitchboard' exited with signal: Killed for 
> container 09e87380-00ab-4987-83c9-fa1c5d86717f
> I1212 13:57:02.726563  2265 launcher.cpp:149] Asked to destroy container 
> 09e87380-00ab-4987-83c9-fa1c5d86717f
> E1212 13:57:02.783607  2228 switchboard.cpp:799] Failed to remove unix domain 
> socket file '/tmp/mesos-io-switchboard-b4af1c92-6633-44f3-9d35-e0e36edaf70a' 
> for container '09e87380-00ab-4987-83c9-fa1c5d86717f': No such file or 
> directory
> ../../mesos/src/tests/containerizer/io_switchboard_tests.cpp:661: Failure
> Value of: wait.get()->reasons().size() == 1
>   Actual: false
> Expected: true
> *** Aborted at 1481579822 (unix time) try "date -d @1481579822" if you are 
> using GNU date ***
> PC: @  0x1bf16d0 testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 2211 (TID 0x7faed7d078c0) from PID 0; 
> stack trace: ***
> @ 0x7faecf855100 (unknown)
> @  0x1bf16d0 testing::UnitTest::AddTestPartResult()
> @  0x1be6247 testing::internal::AssertHelper::operator=()
> @  0x19ed751 
> mesos::internal::tests::IOSwitchboardTest_KillSwitchboardContainerDestroyed_Test::TestBody()
> @  0x1c0ed8c 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x1c09e74 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x1beb505 testing::Test::Run()
> @  0x1bebc88 testing::TestInfo::Run()
> @  0x1bec2ce testing::TestCase::Run()
> @  0x1bf2ba8 testing::internal::UnitTestImpl::RunAllTests()
> @  0x1c0f9b1 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x1c0a9f2 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x1bf18ee testing::UnitTest::Run()
> @  0x11bc9e3 RUN_ALL_TESTS()
> @  0x11bc599 main
> @ 0x7faece663b15 __libc_start_main
> @   0xa9c219 (unknown)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-4872) Dump the contents of the sandbox when a test fails

2017-08-03 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113664#comment-16113664
 ] 

Adam B commented on MESOS-4872:
---

Unassigned. Please contact [~kaysoky] if you would like to work on this.

> Dump the contents of the sandbox when a test fails
> --
>
> Key: MESOS-4872
> URL: https://issues.apache.org/jira/browse/MESOS-4872
> Project: Mesos
>  Issue Type: Improvement
>  Components: test
>Reporter: Joseph Wu
>  Labels: mesosphere, newbie, test
>
> [~bernd-mesos] added this logic for extra info about a rare flaky test:
> https://github.com/apache/mesos/blob/d26baee1f377aedb148ad04cc004bb38b85ee4f6/src/tests/fetcher_cache_tests.cpp#L249-L259
> This information is useful regardless of the test type and should be 
> generalized for {{cluster::Slave}}.  i.e. 
> # When a {{cluster::Slave}} is destructed, it can detect if the test has 
> failed.  
> # If so, navigate through its own {{work_dir}} and print sandboxes and/or 
> other useful debugging info.
> Also see the refactor in [MESOS-4634].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6623) Re-enable tests impacted by request streaming support

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6623:
--
Target Version/s: 1.5.0  (was: 1.4.0)

> Re-enable tests impacted by request streaming support
> -
>
> Key: MESOS-6623
> URL: https://issues.apache.org/jira/browse/MESOS-6623
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, test
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>Priority: Critical
>  Labels: mesosphere
>
> We added support for HTTP request streaming in libprocess as part of 
> MESOS-6466. However, this broke a few tests that relied on HTTP request 
> filtering since the handlers no longer have access to the body of the request 
> when {{visit()}} is invoked. We would need to revisit how we do HTTP request 
> filtering and then re-enable these tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-2153) Add support for systemd journal for logging

2017-08-03 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113663#comment-16113663
 ] 

Adam B commented on MESOS-2153:
---

No recent demand for this feature, unassigning and moving back to Open to 
re-triage later.

> Add support for systemd journal for logging
> ---
>
> Key: MESOS-2153
> URL: https://issues.apache.org/jira/browse/MESOS-2153
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, master
>Reporter: Alexander Rukletsov
>Priority: Minor
>  Labels: mesosphere
>
> We should be able to redirect master and slave logs to systemd journal on the 
> systems where it's available.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-2153) Add support for systemd journal for logging

2017-08-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B reassigned MESOS-2153:
-

Assignee: (was: Joseph Wu)

> Add support for systemd journal for logging
> ---
>
> Key: MESOS-2153
> URL: https://issues.apache.org/jira/browse/MESOS-2153
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, master
>Reporter: Alexander Rukletsov
>Priority: Minor
>  Labels: mesosphere
>
> We should be able to redirect master and slave logs to systemd journal on the 
> systems where it's available.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7123) Investigate splitting offer messages instead of sending a giant single resource offer message.

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7123:
--
Priority: Major  (was: Critical)

We recently upgraded protobuf to 3.3.0 that supports message sizes more than 
64mb. Reducing priority to "Major"

> Investigate splitting offer messages instead of sending a giant single 
> resource offer message.
> --
>
> Key: MESOS-7123
> URL: https://issues.apache.org/jira/browse/MESOS-7123
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, the Mesos master batches all the resource offers into a single 
> message and then sends it to the scheduler. However, for large clusters this 
> can be problematic as this message can exceed the maximum allowed default 
> protobuf message size (~64mb). When such a message reaches the scheduler, 
> it's dropped with a warning followed by a failed invariant check.
> {noformat}
> [libprotobuf ERROR google/protobuf/io/coded_stream.cc:180] A protocol message 
> was rejected because it was too big (more than 67108864 bytes).  To increase 
> the limit (or to disable these warnings), see 
> CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stre
> am.h.
> F0213 21:33:57.658892 60996 sched.cpp:895] Check failed: offers.size() == 
> pids.size() (32664 vs. 0)
> *** Check failure stack trace: ***
> @ 0x7f8d1b4d69bd  (unknown)
> @ 0x7f8d1b4d8750  (unknown)
> @ 0x7f8d1b4d6582  (unknown)
> @ 0x7f8d1b4d90e9  (unknown)
> @ 0x7f8d1aaa646c  (unknown)
> @ 0x7f8d1aaa7df7  (unknown)
> @ 0x7f8d1aa8ee4a  (unknown)
> @ 0x7f8d1aa9d109  (unknown)
> @ 0x7f8d1b46e4e4  (unknown)
> @ 0x7f8d1b46e827  (unknown)
> @ 0x7f8e319b0220  (unknown)
> @ 0x7f8e3355ddc5  start_thread
> @ 0x7f8e32c62ced  __clone
> @  (nil)  (unknown)
> {noformat}
> Possible solutions can be to either batch the offers e.g., 100 offers per 
> message or have a N:1 mapping ie., 1 offer per message by the Mesos master. 
> The batch size can be set via a master flag at startup with a reasonable 
> default value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7349) Document Mesos "check" feature.

2017-08-03 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7349:
--
Priority: Major  (was: Blocker)

> Document Mesos "check" feature.
> ---
>
> Key: MESOS-7349
> URL: https://issues.apache.org/jira/browse/MESOS-7349
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: documentation, mesosphere
>
> This should include framework authors recommendations about how and when to 
> use general checks as well as comparison with health checks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7857) `DISTCHECK_CONFIGURE_FLAGS` does not inherit flags from `configure`

2017-08-03 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-7857:
--

 Summary: `DISTCHECK_CONFIGURE_FLAGS` does not inherit flags from 
`configure`
 Key: MESOS-7857
 URL: https://issues.apache.org/jira/browse/MESOS-7857
 Project: Mesos
  Issue Type: Improvement
Reporter: Chun-Hung Hsiao
Priority: Trivial


When we run {{make distcheck}} in the following scenario:
{noformat}
../configure --with-ssl=/opt/openssl --with-zlib=/opt/zlib 
--with-protobuf=/opt/protobuf
make distcheck
{noformat}
It will report erros about not being able to find zlib, ssl, protobuf, etc,
unless we run the following command instead:
{noformat}
make distcheck DISTCHECK_CONFIGURE_FLAGS="--with-ssl=/opt/openssl 
--with-zlib=/opt/zlib --with-protobuf=/opt/protobuf"
{noformat}
It seems that making {{DISTCHECK_CONFIGURE_FLAGS}} inherit the flags from 
running {{configure}} is more natural and can improve user experience.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7798) Improve libprocess message passing performance

2017-08-03 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7798:
---
Sprint: Mesosphere Sprint 60

> Improve libprocess message passing performance
> --
>
> Key: MESOS-7798
> URL: https://issues.apache.org/jira/browse/MESOS-7798
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Hindman
>Assignee: Benjamin Hindman
> Attachments: perf-2x-lifo-message-cow.svg, perf-2x-lifo-message.svg, 
> perf-2x-lifo.svg, perf-2x-no-lifo.svg, perf-lifo-message-cow.svg, 
> perf-lifo-message.svg, perf-lifo.svg, perf-no-lifo.svg
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7349) Document Mesos "check" feature.

2017-08-03 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7349:
---
Sprint: Mesosphere Sprint 54, Mesosphere Sprint 55, Mesosphere Sprint 56, 
Mesosphere Sprint 61  (was: Mesosphere Sprint 54, Mesosphere Sprint 55, 
Mesosphere Sprint 56)

> Document Mesos "check" feature.
> ---
>
> Key: MESOS-7349
> URL: https://issues.apache.org/jira/browse/MESOS-7349
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>Priority: Blocker
>  Labels: documentation, mesosphere
>
> This should include framework authors recommendations about how and when to 
> use general checks as well as comparison with health checks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7643) The order of isolators provided in '--isolation' flag is not preserved and instead sorted alphabetically

2017-08-03 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7643:
---
Sprint:   (was: Mesosphere Sprint 60)

> The order of isolators provided in '--isolation' flag is not preserved and 
> instead sorted alphabetically
> 
>
> Key: MESOS-7643
> URL: https://issues.apache.org/jira/browse/MESOS-7643
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.1.2, 1.2.0, 1.3.0
>Reporter: Michael Cherny
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: isolation
>
> According to documentation and comments in code the order of the entries in 
> the --isolation flag should specify the ordering of the isolators. 
> Specifically, the
> `create` and `prepare` calls for each isolator should run serially in the 
> order in which they appear in the --isolation flag, while the `cleanup` call 
> should be serialized in reverse order (with exception of filesystem isolator 
> which is always first).
> But in fact, the isolators provided in '--isolation' flag are sorted 
> alphabetically.
> That happens in [this line of 
> code|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L377].
>  In this line use of 'set' is done (apparently instead of list or 
> vector) and set is a sorted container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7840) Add Mesos CLI command to list active tasks

2017-08-03 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7840:
--
Sprint: Mesosphere Sprint 61

> Add Mesos CLI command to list active tasks
> --
>
> Key: MESOS-7840
> URL: https://issues.apache.org/jira/browse/MESOS-7840
> Project: Mesos
>  Issue Type: Improvement
>  Components: cli
>Reporter: Armand Grillet
>Assignee: Armand Grillet
>
> We need to add a command to list all the tasks running in a Mesos cluster by 
> checking the endpoint {{/tasks}} and reporting the results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7840) Add Mesos CLI command to list active tasks

2017-08-03 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7840:
--
Sprint:   (was: Mesosphere Sprint 60)

> Add Mesos CLI command to list active tasks
> --
>
> Key: MESOS-7840
> URL: https://issues.apache.org/jira/browse/MESOS-7840
> Project: Mesos
>  Issue Type: Improvement
>  Components: cli
>Reporter: Armand Grillet
>Assignee: Armand Grillet
>
> We need to add a command to list all the tasks running in a Mesos cluster by 
> checking the endpoint {{/tasks}} and reporting the results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7814) Improve the test frameworks.

2017-08-03 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7814:
--
Sprint: Mesosphere Sprint 61

> Improve the test frameworks.
> 
>
> Key: MESOS-7814
> URL: https://issues.apache.org/jira/browse/MESOS-7814
> Project: Mesos
>  Issue Type: Improvement
>  Components: framework
>Reporter: Armand Grillet
>Assignee: Armand Grillet
>Priority: Minor
>  Labels: mesosphere, newbie
>
> These improvements include three main points:
> * Adding a {{name}} flag to certain frameworks to distinguish between 
> instances.
> * Cleaning up the code style of the frameworks.
> * For frameworks with custom executors, such as balloon framework, adding a 
> {{executor_extra_uris}} flag containing URIs that will be passed to the 
> {{command_info}} of the executor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7814) Improve the test frameworks.

2017-08-03 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7814:
--
Sprint:   (was: Mesosphere Sprint 60)

> Improve the test frameworks.
> 
>
> Key: MESOS-7814
> URL: https://issues.apache.org/jira/browse/MESOS-7814
> Project: Mesos
>  Issue Type: Improvement
>  Components: framework
>Reporter: Armand Grillet
>Assignee: Armand Grillet
>Priority: Minor
>  Labels: mesosphere, newbie
>
> These improvements include three main points:
> * Adding a {{name}} flag to certain frameworks to distinguish between 
> instances.
> * Cleaning up the code style of the frameworks.
> * For frameworks with custom executors, such as balloon framework, adding a 
> {{executor_extra_uris}} flag containing URIs that will be passed to the 
> {{command_info}} of the executor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6950) Launching two tasks with the same Docker image simultaneously may cause a staging dir never cleaned up

2017-08-03 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6950:
--
Shepherd: Qian Zhang
  Sprint: Mesosphere Sprint 60
Story Points: 2

> Launching two tasks with the same Docker image simultaneously may cause a 
> staging dir never cleaned up
> --
>
> Key: MESOS-6950
> URL: https://issues.apache.org/jira/browse/MESOS-6950
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Qian Zhang
>Assignee: Gilbert Song
>
> If user launches two tasks with the same Docker image simultaneously (e.g., 
> run {{mesos-executor}} twice with the same Docker image), there will be a 
> staging directory which is for the second task never cleaned up, like this:
> {code}
> └── store
> └── docker
> ├── layers
> │...
> ├── staging
> │   └── a6rXWC
> └── storedImages
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7695) Add heartbeats to master stream API

2017-08-03 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7695:
--
Shepherd: Anand Mazumdar  (was: Vinod Kone)

> Add heartbeats to master stream API
> ---
>
> Key: MESOS-7695
> URL: https://issues.apache.org/jira/browse/MESOS-7695
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API
>Reporter: Vinod Kone
>Assignee: Quinn
>  Labels: newbie++
>
> Just like master uses heartbeats for scheduler API to keep the connection 
> alive, it should do the same for the streaming API.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7312) Update Resource proto for storage resource providers.

2017-08-03 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112944#comment-16112944
 ] 

Benjamin Bannier commented on MESOS-7312:
-

Still remaining reviews:

https://reviews.apache.org/r/58048/
https://reviews.apache.org/r/58047/

> Update Resource proto for storage resource providers.
> -
>
> Key: MESOS-7312
> URL: https://issues.apache.org/jira/browse/MESOS-7312
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: storage
>
> Storage resource provider support requires a number of changes to the 
> {{Resource}} proto:
> * support for {{RAW}} and {{BLOCK}} type {{Resource::DiskInfo::Source}}
> * {{ResourceProviderID}} in Resource
> * {{Resource::DiskInfo::Source::Path}} should be {{optional}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-2092) Make ACLs dynamic

2017-08-03 Thread Alexander Rojas (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112941#comment-16112941
 ] 

Alexander Rojas commented on MESOS-2092:


[~saitejar] [~alexr] I don't think this issue is even necessary. The reason is, 
that providing dynamic ACLs is relatively easy to do by creating a module that 
mesos loads, and this kind of functionality is better to be built on top of 
mesos rather than inside mesos.

> Make ACLs dynamic
> -
>
> Key: MESOS-2092
> URL: https://issues.apache.org/jira/browse/MESOS-2092
> Project: Mesos
>  Issue Type: Task
>  Components: security
>Reporter: Alexander Rukletsov
>Assignee: Yongqiao Wang
>  Labels: mesosphere, newbie
>
> Master loads ACLs once during its launch and there is no way to update them 
> in a running master. Making them dynamic will allow for updating ACLs on the 
> fly, for example granting a new framework necessary rights.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7555) Add resource provider IDs to the registry

2017-08-03 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-7555:

Remaining Estimate: (was: 5m)
 Original Estimate: (was: 5m)

> Add resource provider IDs to the registry
> -
>
> Key: MESOS-7555
> URL: https://issues.apache.org/jira/browse/MESOS-7555
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Jan Schlicht
>Assignee: Benjamin Bannier
>  Labels: mesosphere, storage
>
> To support resource provider re-registration following a master fail-over, 
> the IDs of registered resource providers need to be kept in the registry.
> An operation to commit those IDs using the registrar needs to be added as 
> well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7806) Add copy assignment operator to `net::IP::Network`

2017-08-03 Thread Avinash Sridharan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112934#comment-16112934
 ] 

Avinash Sridharan commented on MESOS-7806:
--

commit cd3380c4e9521b4b26f9030658816eee7a4b89a1
Author: Avinash sridharan 
Date:   Mon Jul 24 18:24:51 2017 -0700

Added a test to check for copy assignment of `net::IP::Network`.

Review: https://reviews.apache.org/r/61005/


> Add copy assignment operator to `net::IP::Network`
> --
>
> Key: MESOS-7806
> URL: https://issues.apache.org/jira/browse/MESOS-7806
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
> Fix For: 1.4.0
>
>
> Currently, we can't extend the class `net::IP::Network` with out adding a 
> copy assignment operator in the derived class, due to the use of 
> `std::unique_ptr` in the base class. Hence, need to introduce a copy 
> assignment operator into the base class.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7801) Retry logic for unsuccessful `docker rm` during agent recovery

2017-08-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7801:
--
Sprint: Mesosphere Sprint 59  (was: Mesosphere Sprint 59, Mesosphere Sprint 
60)

> Retry logic for unsuccessful `docker rm` during agent recovery
> --
>
> Key: MESOS-7801
> URL: https://issues.apache.org/jira/browse/MESOS-7801
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
> Fix For: 1.4.0
>
>
> In MESOS- we skip the failure when `docker rm` fails due to mount leakage 
> during agent recovery. In order not to leave residual docker containers in 
> the docker daemon, we could do a best-effort `docker rm` retry with an 
> exponential backoff since we cannot control when the leakage would be 
> terminated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7007) filesystem/shared and --default_container_info broken since 1.1

2017-08-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7007:
--
Sprint: Mesosphere Sprint 57, Mesosphere Sprint 58  (was: Mesosphere Sprint 
57, Mesosphere Sprint 58, Mesosphere Sprint 60)

> filesystem/shared and --default_container_info broken since 1.1
> ---
>
> Key: MESOS-7007
> URL: https://issues.apache.org/jira/browse/MESOS-7007
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Pierre Cheynier
>Assignee: Chun-Hung Hsiao
>  Labels: storage
>
> I face this issue, that prevent me to upgrade to 1.1.0 (and the change was 
> consequently introduced in this version):
> I'm using default_container_info to mount a /tmp volume in the container's 
> mount namespace from its current sandbox, meaning that each container have a 
> dedicated /tmp, thanks to the {{filesystem/shared}} isolator.
> I noticed through our automation pipeline that integration tests were failing 
> and found that this is because /tmp (the one from the host!) contents is 
> trashed each time a container is created.
> Here is my setup: 
> * 
> {{--isolation='cgroups/cpu,cgroups/mem,namespaces/pid,*disk/du,filesystem/shared,filesystem/linux*,docker/runtime'}}
> * 
> {{--default_container_info='\{"type":"MESOS","volumes":\[\{"host_path":"tmp","container_path":"/tmp","mode":"RW"\}\]\}'}}
> I discovered this issue in the early days of 1.1 (end of Nov, spoke with 
> someone on Slack), but had unfortunately no time to dig into the symptoms a 
> bit more.
> I found nothing interesting even using GLOGv=3.
> Maybe it's a bad usage of isolators that trigger this issue ? If it's the 
> case, then at least a documentation update should be done.
> Let me know if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7555) Add resource provider IDs to the registry

2017-08-03 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-7555:

Story Points: 5

> Add resource provider IDs to the registry
> -
>
> Key: MESOS-7555
> URL: https://issues.apache.org/jira/browse/MESOS-7555
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Jan Schlicht
>Assignee: Benjamin Bannier
>  Labels: mesosphere, storage
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> To support resource provider re-registration following a master fail-over, 
> the IDs of registered resource providers need to be kept in the registry.
> An operation to commit those IDs using the registrar needs to be added as 
> well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7555) Add resource provider IDs to the registry

2017-08-03 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-7555:

Shepherd: Jie Yu

> Add resource provider IDs to the registry
> -
>
> Key: MESOS-7555
> URL: https://issues.apache.org/jira/browse/MESOS-7555
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Jan Schlicht
>Assignee: Benjamin Bannier
>  Labels: mesosphere, storage
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> To support resource provider re-registration following a master fail-over, 
> the IDs of registered resource providers need to be kept in the registry.
> An operation to commit those IDs using the registrar needs to be added as 
> well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7555) Add resource provider IDs to the registry

2017-08-03 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-7555:
---

  Assignee: Benjamin Bannier
Sprint: Mesosphere Sprint 60
Remaining Estimate: 5m
 Original Estimate: 5m

> Add resource provider IDs to the registry
> -
>
> Key: MESOS-7555
> URL: https://issues.apache.org/jira/browse/MESOS-7555
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Jan Schlicht
>Assignee: Benjamin Bannier
>  Labels: mesosphere, storage
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> To support resource provider re-registration following a master fail-over, 
> the IDs of registered resource providers need to be kept in the registry.
> An operation to commit those IDs using the registrar needs to be added as 
> well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7856) Mesos Build issue while installation on Centos 7.3 with SELinux enabled

2017-08-03 Thread Naina Emmanuel (JIRA)
Naina Emmanuel created MESOS-7856:
-

 Summary: Mesos Build issue while installation on Centos 7.3 with 
SELinux enabled
 Key: MESOS-7856
 URL: https://issues.apache.org/jira/browse/MESOS-7856
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 1.3.0
 Environment: Software Platform
Reporter: Naina Emmanuel
Priority: Critical


I am working on mesos installation on centos 7.3 (SELinux permissive/ enforcing 
tried both), a single node Mesos latest cluster. But when I build the source I 
get these error after running MAKE command.
g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
make[5]: *** [libprocess_la-http.lo] Error 1
make[5]: Leaving directory `/root/mesos/build/3rdparty/libprocess'
make[4]: *** [all-recursive] Error 1
make[4]: Leaving directory `/root/mesos/build/3rdparty/libprocess'
make[3]: *** [all] Error 2
make[3]: Leaving directory `/root/mesos/build/3rdparty/libprocess'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/root/mesos/build/3rdparty'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/root/mesos/build/3rdparty'
make: *** [all-recursive] Error 1

For installation, I am following this link 
https://gist.github.com/anubhavsinha/97ca001257e5b7308edb#file-install-apache-mesos-sh
 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7855) re-compile the mesos src-code is too slow,when I re-compile the src code, it took half hour.

2017-08-03 Thread y123456yz (JIRA)
y123456yz created MESOS-7855:


 Summary: re-compile the mesos src-code is too slow,when I 
re-compile the src code, it took  half hour. 
 Key: MESOS-7855
 URL: https://issues.apache.org/jira/browse/MESOS-7855
 Project: Mesos
  Issue Type: Bug
  Components: cmake
Affects Versions: 1.2.0
Reporter: y123456yz


re-compile the mesos src-code is too slow

I only add a line of printf. bug when I re-compile the src code, it took  half 
hour. 
my god.
how to speed up the src-compile.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7795) Remove "latest" symlink after agent reboot

2017-08-03 Thread Ilya Pronin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Pronin updated MESOS-7795:
---
Shepherd: Yan Xu

> Remove "latest" symlink after agent reboot
> --
>
> Key: MESOS-7795
> URL: https://issues.apache.org/jira/browse/MESOS-7795
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Ilya Pronin
>Assignee: Ilya Pronin
>Priority: Minor
>
> Currently when the agent detects that the host was rebooted it doesn't 
> recover agent info. New agent info is not checkpointed until the agent 
> successfully registers with a master. If the agent crashes before 
> registering, on restart it will recover the old agent info that was 
> checkpointed before host reboot.
> This can lead to problems. E.g. the agent may flap due to incompatible agent 
> info, if its resources somehow change after reboot. Or the usage of the old 
> agent ID in reregistration process may cause crashes like MESOS-7432.
> We can remove the "latest" symlink when we detect that current boot ID is 
> different from the checkpointed one in order to prevent the agent from 
> recovering stale info after we checkpoint new boot ID. Or we can postpone 
> boot ID checkpointing until we checkpointed new agent info.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7795) Remove "latest" symlink after agent reboot

2017-08-03 Thread Ilya Pronin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Pronin reassigned MESOS-7795:
--

Assignee: Ilya Pronin

> Remove "latest" symlink after agent reboot
> --
>
> Key: MESOS-7795
> URL: https://issues.apache.org/jira/browse/MESOS-7795
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Ilya Pronin
>Assignee: Ilya Pronin
>Priority: Minor
>
> Currently when the agent detects that the host was rebooted it doesn't 
> recover agent info. New agent info is not checkpointed until the agent 
> successfully registers with a master. If the agent crashes before 
> registering, on restart it will recover the old agent info that was 
> checkpointed before host reboot.
> This can lead to problems. E.g. the agent may flap due to incompatible agent 
> info, if its resources somehow change after reboot. Or the usage of the old 
> agent ID in reregistration process may cause crashes like MESOS-7432.
> We can remove the "latest" symlink when we detect that current boot ID is 
> different from the checkpointed one in order to prevent the agent from 
> recovering stale info after we checkpoint new boot ID. Or we can postpone 
> boot ID checkpointing until we checkpointed new agent info.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7728) Java HTTP adapter crashes JVM when leading master disconnects.

2017-08-03 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7728:
---
Labels: mesosphere reliability  (was: mesosphere)

> Java HTTP adapter crashes JVM when leading master disconnects.
> --
>
> Key: MESOS-7728
> URL: https://issues.apache.org/jira/browse/MESOS-7728
> Project: Mesos
>  Issue Type: Bug
>  Components: java api
>Affects Versions: 1.1.2, 1.2.1, 1.3.0
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere, reliability
> Fix For: 1.1.3, 1.2.2, 1.3.1, 1.4.0
>
>
> When a Java scheduler using HTTP v0-v1 adapter loses the leading Mesos 
> master, {{V0ToV1AdapterProcess::disconnected()}} is invoked, which in turn 
> invokes Java scheduler [code via 
> JNI|https://github.com/apache/mesos/blob/87c38b9e2bc5b1030a071ddf0aab69db70d64781/src/java/jni/org_apache_mesos_v1_scheduler_V0Mesos.cpp#L446].
>  This call uses the wrong object, {{jmesos}} instead of {{jscheduler}}, which 
> crashes JVM:
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4bca3849bf, pid=21, tid=0x7f4b2ac45700
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_131-b11) (build 
> 1.8.0_131-b11)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # V  [libjvm.so+0x6d39bf]  jni_invoke_nonstatic(JNIEnv_*, JavaValue*, 
> _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x1af
> {noformat}
> {noformat}
> Stack: [0x7f4b2a445000,0x7f4b2ac46000],  sp=0x7f4b2ac44a80,  free 
> space=8190k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> V  [libjvm.so+0x6d39bf]  jni_invoke_nonstatic(JNIEnv_*, JavaValue*, 
> _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x1af
> V  [libjvm.so+0x6d7fef]  jni_CallVoidMethodV+0x10f
> C  [libmesos-1.2.0.so+0x1aa32d3]  JNIEnv_::CallVoidMethod(_jobject*, 
> _jmethodID*, ...)+0x93
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7709) Add --default_container_dns flag to the agent.

2017-08-03 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112370#comment-16112370
 ] 

Qian Zhang commented on MESOS-7709:
---

commit ebfccf4ea12ccc4b02700b44e69ab17affd50019
Author: Qian Zhang 
Date:   Wed Jun 28 14:10:01 2017 +0800

Introduced `--default_container_dns` agent flag.

Review: https://reviews.apache.org/r/60500

commit 48b5ef036905ae5a112af26ab6985953f8179c8c
Author: Qian Zhang 
Date:   Wed Jun 28 22:26:02 2017 +0800

Passed default container DNS info to Docker executor.

Review: https://reviews.apache.org/r/60557

commit 3da83b3318a612b3bbf1223fbafe506c1ed4bcc4
Author: Qian Zhang 
Date:   Fri Jun 30 09:53:41 2017 +0800

Set container DNS with `--default_container_dns` in Docker executor.

Review: https://reviews.apache.org/r/60558

commit cf841cdd482d78e2872f04539cf82d151dffd689
Author: Qian Zhang 
Date:   Mon Jul 24 14:38:22 2017 +0800

Set container DNS with `--default_container_dns` in DockerContainerizer.

Review: https://reviews.apache.org/r/61075

commit 30b49016adc30a9598d426ee35af14ee73963f77
Author: Qian Zhang 
Date:   Mon Jul 3 21:50:17 2017 +0800

Set container DNS with `--default_container_dns` in CNI isolator.

Review: https://reviews.apache.org/r/60600

commit 28faca0ae2b3aeeddd078a978f4b7b2483d03c20
Author: Qian Zhang 
Date:   Tue Jul 11 14:41:17 2017 +0800

Parsed DNS related info from the output of `docker inspect`.

Review: https://reviews.apache.org/r/60760

commit d67595cdeff45f54aa227e5ae33afe6b7ac1c53a
Author: Qian Zhang 
Date:   Tue Jul 11 16:58:43 2017 +0800

Added a test `DockerContainerizerTest.ROOT_DOCKER_DefaultDNS`.

Review: https://reviews.apache.org/r/60761

commit 18ad7fb7473849ba67b1ff007e2d131556eb42d8
Author: Qian Zhang 
Date:   Wed Jul 12 14:48:08 2017 +0800

Added a test `DefaultContainerDNSCniTest.ROOT_VerifyDefaultDNS`.

Review: https://reviews.apache.org/r/60793

commit 01e5bc386c744aa5e9976935674639595557a760
Author: Qian Zhang 
Date:   Sun Jul 30 23:29:10 2017 +0800

Added a test `DefaultContainerDNSFlagTest.ValidateFlag`.

Review: https://reviews.apache.org/r/61245

> Add --default_container_dns flag to the agent.
> --
>
> Key: MESOS-7709
> URL: https://issues.apache.org/jira/browse/MESOS-7709
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Qian Zhang
>  Labels: mesosphere
>
> Mesos support both CNI (through `network/cni` isolator) and CNM (through 
> docker) specification. Both these specifications allow for DNS entries for 
> containers to be set on a per-container, and per-network basis. 
> Currently, the behavior of the agent is to use the DNS nameservers set in 
> /etc/resolv.conf when the CNI or CNM plugin that is used to attached the 
> container to the CNI/CNM network doesnt' explicitly set the DNS for the 
> container. This is a bit inflexible especially when we have a mix of v4 and 
> v6 networks. 
> The operator should be able to specify DNS nameservers for the networks he 
> installs either the override the ones provided by the plugin or as defaults 
> when the plugins are not going to specify DNS name servers.
> In order to achieve the above goal we need to introduce a `\--dns` flag to 
> the agent. The `\--dns` flag should support a JSON (or a JSON file) with the 
> following schema:
> {code}
> {
>   "mesos": [
> {
>   "network" : ,
>   "nameservers": []
> }
>   ],
>   "docker": [
> {
>   "network" : ,
>   "nameservers": []
> }
>   ]
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7854) Authorize resource calls to provider manager api

2017-08-03 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-7854:
---

 Summary: Authorize resource calls to provider manager api
 Key: MESOS-7854
 URL: https://issues.apache.org/jira/browse/MESOS-7854
 Project: Mesos
  Issue Type: Improvement
Reporter: Benjamin Bannier
Priority: Critical


The resource provider manager provides a function
{code}
process::Future api(
const process::http::Request& request,
const Option& principal) const;
{code}
which is expose e.g., as an agent endpoint.

We need to add authorization to this function in order to e.g., stop rough 
callers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7846) when I upgrade my executor, mesos-slave kill all my lxc tasks, why? how to deal this porblem.

2017-08-03 Thread y123456yz (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112318#comment-16112318
 ] 

y123456yz commented on MESOS-7846:
--

add checkpoint can resolve this problem.

> when I upgrade my executor, mesos-slave kill all my lxc tasks, why? how to 
> deal this porblem.
> -
>
> Key: MESOS-7846
> URL: https://issues.apache.org/jira/browse/MESOS-7846
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, executor
>Affects Versions: 1.1.0
>Reporter: y123456yz
>
> when I upgrade my executor, mesos-slave kill all my lxc tasks, why? how to 
> deal this porblem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7853) Support shared PID namespace.

2017-08-03 Thread Qian Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Zhang reassigned MESOS-7853:
-

Assignee: Qian Zhang

> Support shared PID namespace.
> -
>
> Key: MESOS-7853
> URL: https://issues.apache.org/jira/browse/MESOS-7853
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Qian Zhang
>  Labels: containerizer, mesosphere, namespaces
>
> Currently, with the 'namespaces/pid' isolator enabled, each container will 
> have its own pid namespace. This does not meet the need for some scenarios. 
> For example, under the same executor container, one task wants to reach out 
> to another task which need to share the same pid namespace.
> We should support container pid namespace to be configurable. Users can 
> choose one container to share its parent's pid namespace or not.
> User facing API:
> {noformat}
> message LinuxInfo {
>   ..
>   // True if it shares the pid namepace with its parent. If the
>   // container is a top level container, it means share the pid
>   // namespace with the agent. If the container is a nested
>   // container, it means share the pid namespce with its parent
>   // container. This field will be ignored if 'namespaces/pid'
>   // isolator is not enabled.
>   optional bool share_pid_namespace = 4;
> }
> {noformat}
> A new agent flag:
> --disallow_top_level_pid_ns_sharing (defaults to be: false)
> this is a security concern from operator's perspective. While some of the 
> nested containers share the pid namespace from their parents, the top level 
> containers always not share the pid ns from the agent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7828) Current approach to parse protobuf enum from JSON does not support upgrades

2017-08-03 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112280#comment-16112280
 ] 

Qian Zhang commented on MESOS-7828:
---

And I get the same result when setting 
{{JsonParseOptions.ignore_unknown_fields}} to true. And based on my test, 
{{JsonParseOptions.ignore_unknown_fields}} will take effect only if there is an 
unknown field (rather than a known field with unknown enum value) in the JSON.

> Current approach to parse protobuf enum from JSON does not support upgrades
> ---
>
> Key: MESOS-7828
> URL: https://issues.apache.org/jira/browse/MESOS-7828
> Project: Mesos
>  Issue Type: Bug
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>
> To use protobuf enum in a backwards compatible way, [the suggestion on the 
> protobuf mailing 
> list|https://groups.google.com/forum/#!msg/protobuf/NhUjBfDyGmY/pf294zMi2bIJ] 
> is to use optional enum fields and include an UNKNOWN value as the first 
> entry in the enum list (and/or explicitly specifying it as the default). This 
> can handle the case of parsing protobuf message from a serialized string, but 
> it can not handle the case of parsing protobuf message from JSON.
> E.g., when I access master endpoint with an inexistent enum {{xxx}}, I will 
> get an error:
> {code}
> $ curl -X POST -H "Content-Type: application/json" -d '{"type": "xxx"}' 
> 127.0.0.1:5050/api/v1
> Failed to convert JSON into Call protobuf: Failed to find enum for 'xxx'% 
> {code}
> In the {{Call}} protobuf message, the enum {{Type}} already has a default 
> value {{UNKNOWN}} (see 
> [here|https://github.com/apache/mesos/blob/1.3.0/include/mesos/v1/master/master.proto#L45]
>  for details) and the field {{Call.type}} is optional, but the above curl 
> command will still fail. The root cause is, in the code 
> [here|https://github.com/apache/mesos/blob/1.3.0/3rdparty/stout/include/stout/protobuf.hpp#L449:L454]
>  when we try to get the enum value for the string "xxx", it will fail since 
> there is no any enum value corresponding to "xxx".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)