[jira] [Created] (MESOS-6498) Broken links in authorization documentation

2016-10-27 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6498:
-

 Summary: Broken links in authorization documentation
 Key: MESOS-6498
 URL: https://issues.apache.org/jira/browse/MESOS-6498
 Project: Mesos
  Issue Type: Bug
  Components: documentation
Reporter: Vinod Kone


Looks like a bunch of links in the authorization doc need to be re-written.

https://validator.w3.org/checklink?uri=http%3A%2F%2Fmesos.apache.org%2Fdocumentation%2Flatest%2Fauthorization%2F&hide_type=all&depth=&check=Check



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6489) Better support for containers that want to manage their own cgroup.

2016-10-27 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613942#comment-15613942
 ] 

Yan Xu commented on MESOS-6489:
---

A slightly different proposal for discussion:

h4. Part one
Give {{Future cgroups::destroy(const string& hierarchy, const string& 
cgroup)}} a {{bool continueOnError = true}} argument. In it:

1. Extract all cgroups (including sub cgroups) in bottom up fashion via 
{{cgroups::get(hierarchy, cgroup)}}
2. Hand over things to the {{Destroyer}} (i.e., don't remove cgroups in 
{{cgroups::destroy}} itself. Let {{Destroyer}} abstract this part out, it still 
does its job without a freezer subsystem)
3. Destroyer first tries to kill tasks. If freezer subsystem is available, 
it'll kill the tasks using TaskKillers; if not this step is a noop. 
4. TaskKiller for an individual cgroup will fail if it the cgroup is missing. 
It's OK if we let it fail. We don't need to change the logic here.
5. The Destroyer is given a {{bool continueOnError}} mode, if {{true}}, it 
{{awaits}} for all futures (instead of {{collect}}) and tries to remove cgroups 
recursively regardless of failed futures. This is safe because if there are 
running tasks in a cgroup, {{remove()}} would just fail. We are also not 
singling out "if somebody has removed cgroup in a race condition" but rather 
giving Destroyer a more aggressive mode.

If Docker removed a nested cgroup and caused an error, we still propagate the 
error after a more aggressive cleanup which still ends up destroying 
everything. The caller can decide how to deal with it the error.

h4. Part two
{{repair}} the Future returned by {{cgroups::destroy(...)}} in 
{{LinuxLauncherProcess::destroy(...)}}:

With the above, the "hacky" part is left to {{Future 
LinuxLauncherProcess::destroy(const ContainerID& containerId)}} where you can 
repair the destroy:

{code:title=}
return cgroups::destroy(
  freezerHierarchy,
  cgroup(container->id),
  cgroups::DESTROY_TIMEOUT)
  .repair([](const Future& result) {
// Comments explaining this.
return cgroups::exists(cgroup(container->id)) ? result : Nothing(); 
  });
{code}

--- 

As a separate improvement, we can freeze the cgroups top-down by reversing the 
list returned by {{cgroups::get()}} when launching {{TaskKillers}}. 
{{TaskKillers}} currently in parallel but it should be OK.

> Better support for containers that want to manage their own cgroup.
> ---
>
> Key: MESOS-6489
> URL: https://issues.apache.org/jira/browse/MESOS-6489
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>
> Some containers want to manage their cgroup by sub-dividing the cgroup that 
> Mesos allocates to them into multiple sub-cgroups and put subprocess into the 
> corresponding sub-cgroups.
> For instance, someone wants to run Docker daemon in a Mesos container. Docker 
> daemon will manage the cgroup assigned to it by Mesos (with the help , for 
> example, cgroups namespace).
> Problems arise during the teardown of the container because two entities 
> might be manipulating the same cgroup simultaneously. For example, the Mesos 
> cgroups::destroy might fail if the task running inside is trying to delete 
> the same nested cgroup at the same time.
> To support that case, we should consider kill all the processes in the Mesos 
> cgroup first, making sure that no one will be creating sub-cgroups and moving 
> new processes into sub-cgroups. And then, destroy the cgroups recursively.
> And we need freezer because we want to make sure all processes are stopped 
> while we are sending kill signals to avoid TOCTTOU race problem. I think it 
> makes more sense to freezer the cgroups (and sub-cgroups) from top down 
> (rather than bottom up because typically, processes in the parent cgroup 
> manipulate sub-cgroups).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6040) Add a CMake build for `mesos-port-mapper`

2016-10-27 Thread Avinash Sridharan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613679#comment-15613679
 ] 

Avinash Sridharan commented on MESOS-6040:
--

Removing this from this sprint since I am traveling and won't have cycles to 
work on this in the next sprint.

> Add a CMake build for `mesos-port-mapper`
> -
>
> Key: MESOS-6040
> URL: https://issues.apache.org/jira/browse/MESOS-6040
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>Priority: Blocker
>  Labels: mesosphere
>
> Once the port-mapper binary compiles with GNU make, we need to modify the 
> CMake to build the port-mapper binary as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6040) Add a CMake build for `mesos-port-mapper`

2016-10-27 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-6040:
-
Sprint: Mesosphere Sprint 41, Mesosphere Sprint 42  (was: Mesosphere Sprint 
41, Mesosphere Sprint 42, Mesosphere Sprint 45)

> Add a CMake build for `mesos-port-mapper`
> -
>
> Key: MESOS-6040
> URL: https://issues.apache.org/jira/browse/MESOS-6040
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>Priority: Blocker
>  Labels: mesosphere
>
> Once the port-mapper binary compiles with GNU make, we need to modify the 
> CMake to build the port-mapper binary as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6400) Not able to remove Orphan Tasks

2016-10-27 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613513#comment-15613513
 ] 

Gilbert Song commented on MESOS-6400:
-

[~mithril], thanks for recording the logs. We will address all related tech 
debt in Mesos. BTW, you can resolve the orphan task issue by tearing down the 
unregistered marathon framework using the workaround in the following doc:

https://gist.github.com/bernadinm/41bca6058f9137cd21f4fb562fd20d50

> Not able to remove Orphan Tasks
> ---
>
> Key: MESOS-6400
> URL: https://issues.apache.org/jira/browse/MESOS-6400
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: centos 7 x64
>Reporter: kasim
>Priority: Critical
>
> The problem maybe cause by Mesos and Marathon out of sync
> https://github.com/mesosphere/marathon/issues/616
> When I found Orphan Tasks happen, I
> 1. restart marathon
> 2. marathon do not sync Orphan Tasks, but start new tasks.
> 3. Orphan Tasks still taked the resource, I have to delete them.
> 4. I find all Orphan Tasks are under framework 
> `ef169d8a-24fc-41d1-8b0d-c67718937a48-`,
> curl -XGET `http://c196:5050/master/frameworks` shows that framework is 
> `unregistered_frameworks`
> {code}
> {
> "frameworks": [
> .
> ],
> "completed_frameworks": [ ],
> "unregistered_frameworks": [
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-",
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-",
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-"
> ]
> }
> {code}
> 5.Try {code}curl -XPOST http://c196:5050/master/teardown -d 
> 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-' {code}
> , but get `No framework found with specified ID`
> So I have no idea to delete Orphan Tasks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6497) HTTP Adapter does not surface MasterInfo.

2016-10-27 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613287#comment-15613287
 ] 

Anand Mazumdar edited comment on MESOS-6497 at 10/27/16 9:30 PM:
-

We decided to have an optional {{MasterInfo}} field in the {{SUBSCRIBED}} event 
thereby providing the schedulers with this information. Another option was 
adding it to the {{connected}} callback on the scheduler library but we punted 
on it because in the future schedulers might want to use their own detection 
library that might not read contents from Master ZK to populate {{MasterInfo}} 
correctly.


was (Author: anandmazumdar):
We decided to have an optional {{MasterInfo}} field in the {{SUBSCRIBED}} event 
thereby providing the schedulers with this information. Another option was 
adding it to the {{connected}} callback on the scheduler library but we punted 
on it because in the future schedulers might want to use their own detection 
library that might not read contents from Master ZK. 

> HTTP Adapter does not surface MasterInfo.
> -
>
> Key: MESOS-6497
> URL: https://issues.apache.org/jira/browse/MESOS-6497
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere, v1_api
>
> The HTTP adapter does not surface the {{MasterInfo}}. This makes it not 
> compatible with the V0 API where the {{registered}} and {{reregistered}} 
> calls provided the MasterInfo to the framework.
> cc [~vinodkone]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6497) HTTP Adapter does not surface MasterInfo.

2016-10-27 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613287#comment-15613287
 ] 

Anand Mazumdar commented on MESOS-6497:
---

We decided to have an optional {{MasterInfo}} field in the {{SUBSCRIBED}} event 
thereby providing the schedulers with this information. Another option was 
adding it to the {{connected}} callback on the scheduler library but we punted 
on it because in the future schedulers might want to use their own detection 
library that might not read contents from Master ZK. 

> HTTP Adapter does not surface MasterInfo.
> -
>
> Key: MESOS-6497
> URL: https://issues.apache.org/jira/browse/MESOS-6497
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere, v1_api
>
> The HTTP adapter does not surface the {{MasterInfo}}. This makes it not 
> compatible with the V0 API where the {{registered}} and {{reregistered}} 
> calls provided the MasterInfo to the framework.
> cc [~vinodkone]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6497) HTTP Adapter does not surface MasterInfo.

2016-10-27 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6497:
--
   Shepherd: Vinod Kone
Description: 
The HTTP adapter does not surface the {{MasterInfo}}. This makes it not 
compatible with the V0 API where the {{registered}} and {{reregistered}} calls 
provided the MasterInfo to the framework.
cc [~vinodkone]

  was:
The HTTP adapter does not surface the MasterInfo. This makes it not compatible 
with the V0 API where the {{registered}} and {{reregistered}} calls provided 
the MasterInfo to the framework.
cc [~vinodkone]

Summary: HTTP Adapter does not surface MasterInfo.  (was: HTTP Adapter 
does not surface MasterInfo)

> HTTP Adapter does not surface MasterInfo.
> -
>
> Key: MESOS-6497
> URL: https://issues.apache.org/jira/browse/MESOS-6497
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere, v1_api
>
> The HTTP adapter does not surface the {{MasterInfo}}. This makes it not 
> compatible with the V0 API where the {{registered}} and {{reregistered}} 
> calls provided the MasterInfo to the framework.
> cc [~vinodkone]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6462) Design Doc: Mesos Support for Container Attach and Container Exec

2016-10-27 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6462:
---
Description: 
Here is a link to the design doc:
https://docs.google.com/document/d/1nAVr0sSSpbDLrgUlAEB5hKzCl482NSVk8V0D56sFMzU

It is not yet complete, but it is filled out enough to start eliciting 
feedback. Please feel free to add comments (or even add content!) as you wish.

  was:
Here is a link to the design doc:
https://docs.google.com/document/d/1nAVr0sSSpbDLrgUlAEB5hKzCl482NSVk8V0D56sFMzU/edit#heading=h.jcjim99nrfbv

It is not yet complete, but it is filled out enough to start eliciting 
feedback. Please feel free to add comments (or even add content!) as you wish.


> Design Doc: Mesos Support for Container Attach and Container Exec
> -
>
> Key: MESOS-6462
> URL: https://issues.apache.org/jira/browse/MESOS-6462
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Here is a link to the design doc:
> https://docs.google.com/document/d/1nAVr0sSSpbDLrgUlAEB5hKzCl482NSVk8V0D56sFMzU
> It is not yet complete, but it is filled out enough to start eliciting 
> feedback. Please feel free to add comments (or even add content!) as you wish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6462) Design Doc: Mesos Support for Container Attach and Container Exec

2016-10-27 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6462:
---
Description: 
Here is a link to the design doc:
https://docs.google.com/document/d/1nAVr0sSSpbDLrgUlAEB5hKzCl482NSVk8V0D56sFMzU/edit#heading=h.jcjim99nrfbv

It is not yet complete, but it is filled out enough to start eliciting 
feedback. Please feel free to add comments (or even add content!) as you wish.

> Design Doc: Mesos Support for Container Attach and Container Exec
> -
>
> Key: MESOS-6462
> URL: https://issues.apache.org/jira/browse/MESOS-6462
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Here is a link to the design doc:
> https://docs.google.com/document/d/1nAVr0sSSpbDLrgUlAEB5hKzCl482NSVk8V0D56sFMzU/edit#heading=h.jcjim99nrfbv
> It is not yet complete, but it is filled out enough to start eliciting 
> feedback. Please feel free to add comments (or even add content!) as you wish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6446) WebUI redirect doesn't work with stats from /metric/snapshot

2016-10-27 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6446:
--
Fix Version/s: 1.0.2

Cherry-picked for 1.0.2

commit ec315f28e6f86813af3b756be190e1b48c21404d
Author: Vinod Kone 
Date:   Thu Oct 27 10:28:29 2016 -0700

Added MESOS-6446 to CHANGELOG for 1.0.2.

commit 1ca4db714fd0acc6095a7a0e14c373a3775df528
Author: haosdent huang 
Date:   Thu Oct 27 10:22:18 2016 -0700

Fixed the broken metrics information of master in WebUI.

After we introduced redirection on `/master/state` endpoint to the
leading master in `c9153336`, the metrics information in the WebUI
was broken when the current master is not the leading master.

In this patch, we retrieve the leading master from `/master/state`
endpoint and ensure that requests to `/metrics/snapshot` and `/state`
endpoints are always sent to the leading master.

Review: https://reviews.apache.org/r/53172/

commit ef90134ccbcd3239241a6d5571aaaf0192e1c294
Author: haosdent huang 
Date:   Thu Oct 27 10:22:09 2016 -0700

Show the leading master's information in `/master/state` endpoint.

Review: https://reviews.apache.org/r/53193/


> WebUI redirect doesn't work with stats from /metric/snapshot
> 
>
> Key: MESOS-6446
> URL: https://issues.apache.org/jira/browse/MESOS-6446
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.0.0
>Reporter: Yan Xu
>Assignee: haosdent
>Priority: Blocker
> Fix For: 1.0.2, 1.2.0
>
> Attachments: Screen Shot 2016-10-21 at 12.04.23 PM.png, 
> webui_metrics.gif
>
>
> After Mesos 1.0, the webUI redirect is hidden from the users so you can go to 
> any of the master and the webUI is populated with state.json from the leading 
> master. 
> This doesn't include stats from /metric/snapshot though as it is not 
> redirected. The user ends up seeing some fields with empty values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6446) WebUI redirect doesn't work with stats from /metric/snapshot

2016-10-27 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6446:
--
Fix Version/s: 1.1.0

Cherry-picked for 1.1.0

commit e69f819fc996f4c328a8968131a1e807a0692bf1
Author: Vinod Kone 
Date:   Thu Oct 27 13:29:33 2016 -0700

Added MESOS-6446 to 1.1.0 CHANGELOG.

commit 4ac4916a39a6beb81fab0c8d7d72fc6c06e2e650
Author: haosdent huang 
Date:   Thu Oct 27 10:22:18 2016 -0700

Fixed the broken metrics information of master in WebUI.

After we introduced redirection on `/master/state` endpoint to the
leading master in `c9153336`, the metrics information in the WebUI
was broken when the current master is not the leading master.

In this patch, we retrieve the leading master from `/master/state`
endpoint and ensure that requests to `/metrics/snapshot` and `/state`
endpoints are always sent to the leading master.

Review: https://reviews.apache.org/r/53172/

commit 0d747295cbcb897f245ef209a7760f0fad558a35
Author: haosdent huang 
Date:   Thu Oct 27 10:22:09 2016 -0700

Show the leading master's information in `/master/state` endpoint.

Review: https://reviews.apache.org/r/53193/


> WebUI redirect doesn't work with stats from /metric/snapshot
> 
>
> Key: MESOS-6446
> URL: https://issues.apache.org/jira/browse/MESOS-6446
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.0.0
>Reporter: Yan Xu
>Assignee: haosdent
>Priority: Blocker
> Fix For: 1.0.2, 1.1.0, 1.2.0
>
> Attachments: Screen Shot 2016-10-21 at 12.04.23 PM.png, 
> webui_metrics.gif
>
>
> After Mesos 1.0, the webUI redirect is hidden from the users so you can go to 
> any of the master and the webUI is populated with state.json from the leading 
> master. 
> This doesn't include stats from /metric/snapshot though as it is not 
> redirected. The user ends up seeing some fields with empty values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6497) HTTP Adapter does not surface MasterInfo

2016-10-27 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-6497:
---

 Summary: HTTP Adapter does not surface MasterInfo
 Key: MESOS-6497
 URL: https://issues.apache.org/jira/browse/MESOS-6497
 Project: Mesos
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Joris Van Remoortere
Assignee: Anand Mazumdar
Priority: Blocker


The HTTP adapter does not surface the MasterInfo. This makes it not compatible 
with the V0 API where the {{registered}} and {{reregistered}} calls provided 
the MasterInfo to the framework.
cc [~vinodkone]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6372) Improvements to shared resources

2016-10-27 Thread Anindya Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anindya Sinha reassigned MESOS-6372:


Assignee: Anindya Sinha

> Improvements to shared resources
> 
>
> Key: MESOS-6372
> URL: https://issues.apache.org/jira/browse/MESOS-6372
> Project: Mesos
>  Issue Type: Epic
>Reporter: Yan Xu
>Assignee: Anindya Sinha
>
> This is a follow up epic to MESOS-3421 to capture further improvements and 
> changes that need to be made to the MVP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5792) Add mesos tests to CMake (make check)

2016-10-27 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-5792:
-
Sprint: Mesosphere Sprint 40, Mesosphere Sprint 41, Mesosphere Sprint 42, 
Mesosphere Sprint 44  (was: Mesosphere Sprint 40, Mesosphere Sprint 41, 
Mesosphere Sprint 42, Mesosphere Sprint 44, Mesosphere Sprint 45)

> Add mesos tests to CMake (make check)
> -
>
> Key: MESOS-5792
> URL: https://issues.apache.org/jira/browse/MESOS-5792
> Project: Mesos
>  Issue Type: Improvement
>  Components: build
>Reporter: Srinivas
>Assignee: Srinivas
>  Labels: build, mesosphere
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Provide CMakeLists.txt and configuration files to build mesos tests using 
> CMake.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6496) Support up-casting of Shared and Owned

2016-10-27 Thread Neil Conway (JIRA)
Neil Conway created MESOS-6496:
--

 Summary: Support up-casting of Shared and Owned
 Key: MESOS-6496
 URL: https://issues.apache.org/jira/browse/MESOS-6496
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Reporter: Neil Conway


It should be possible to pass a {{Shared}} value to an object that 
takes a parameter of type {{Shared}}. Similarly for {{Owned}}. In 
general, {{Shared}} should be implicitly convertable to {{Shared}} iff 
{{T2}} is implicitly convertable to {{T1}}. In C++11, this works because they 
define the appropriate conversion constructor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6446) WebUI redirect doesn't work with stats from /metric/snapshot

2016-10-27 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6446:

Shepherd: Vinod Kone

> WebUI redirect doesn't work with stats from /metric/snapshot
> 
>
> Key: MESOS-6446
> URL: https://issues.apache.org/jira/browse/MESOS-6446
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.0.0
>Reporter: Yan Xu
>Assignee: haosdent
>Priority: Blocker
> Fix For: 1.2.0
>
> Attachments: Screen Shot 2016-10-21 at 12.04.23 PM.png, 
> webui_metrics.gif
>
>
> After Mesos 1.0, the webUI redirect is hidden from the users so you can go to 
> any of the master and the webUI is populated with state.json from the leading 
> master. 
> This doesn't include stats from /metric/snapshot though as it is not 
> redirected. The user ends up seeing some fields with empty values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6212) Validate the name format of mesos-managed docker containers

2016-10-27 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612536#comment-15612536
 ] 

Anand Mazumdar commented on MESOS-6212:
---

Keeping the JIRA open till I complete the backport to 1.0.2.

> Validate the name format of mesos-managed docker containers
> ---
>
> Key: MESOS-6212
> URL: https://issues.apache.org/jira/browse/MESOS-6212
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Affects Versions: 1.0.1
>Reporter: Marc Villacorta
>Assignee: Manuwela Kanade
> Fix For: 1.1.0
>
>
> Validate the name format of mesos-managed docker containers in order to avoid 
> false positives when looking for orphaned mesos tasks.
> Currently names such as _'mesos-master'_, _'mesos-agent'_ and _'mesos-dns'_ 
> are wrongly terminated when {{--docker_kill_orphans}} is set to true 
> (default).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6212) Validate the name format of mesos-managed docker containers

2016-10-27 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6212:
--
Target Version/s: 1.0.2, 1.1.0  (was: 1.0.2)
   Fix Version/s: (was: 1.0.2)
  1.1.0

> Validate the name format of mesos-managed docker containers
> ---
>
> Key: MESOS-6212
> URL: https://issues.apache.org/jira/browse/MESOS-6212
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Affects Versions: 1.0.1
>Reporter: Marc Villacorta
>Assignee: Manuwela Kanade
> Fix For: 1.1.0
>
>
> Validate the name format of mesos-managed docker containers in order to avoid 
> false positives when looking for orphaned mesos tasks.
> Currently names such as _'mesos-master'_, _'mesos-agent'_ and _'mesos-dns'_ 
> are wrongly terminated when {{--docker_kill_orphans}} is set to true 
> (default).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6458) Add test to check fromString function of stout library

2016-10-27 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6458:
--
Target Version/s:   (was: 1.0.2)
   Fix Version/s: 1.1.0

> Add test to check fromString function of stout library
> --
>
> Key: MESOS-6458
> URL: https://issues.apache.org/jira/browse/MESOS-6458
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Affects Versions: 1.0.1
>Reporter: Manuwela Kanade
>Assignee: Manuwela Kanade
>Priority: Trivial
> Fix For: 1.1.0
>
>
> For the 3rdparty stout library, there is a testcase for checking Malformed 
> UUID. 
> But this testcase does not have a positive test for the fromString function 
> to test if it returns correct UUID when passed a correctly formatted UUID 
> string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6495) Create metrics for HTTP API endpoint response codes.

2016-10-27 Thread Zhitao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhitao Li updated MESOS-6495:
-
Summary: Create metrics for HTTP API endpoint response codes.  (was: Create 
metrics for HTTP API endpoint)

> Create metrics for HTTP API endpoint response codes.
> 
>
> Key: MESOS-6495
> URL: https://issues.apache.org/jira/browse/MESOS-6495
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Zhitao Li
>
> We should have some metrics about various response code for (scheduler) HTTP 
> API (2xx, 4xx, etc)
> [~anandmazumdar] suggested that ideally the solution could be easily extended 
> to cover other endpoints if we can directly enhance libprocess, so we can 
> cover other API (Master/Agent).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6495) Create metrics for HTTP API endpoint

2016-10-27 Thread Zhitao Li (JIRA)
Zhitao Li created MESOS-6495:


 Summary: Create metrics for HTTP API endpoint
 Key: MESOS-6495
 URL: https://issues.apache.org/jira/browse/MESOS-6495
 Project: Mesos
  Issue Type: Improvement
Reporter: Zhitao Li


We should have some metrics about various response code for (scheduler) HTTP 
API (2xx, 4xx, etc)

[~anandmazumdar] suggested that ideally the solution could be easily extended 
to cover other endpoints if we can directly enhance libprocess, so we can cover 
other API (Master/Agent).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6494) Clean up the flags parsing in the executors

2016-10-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612390#comment-15612390
 ] 

Gastón Kleiman commented on MESOS-6494:
---

Patches in the chain starting with: https://reviews.apache.org/r/52878/

> Clean up the flags parsing in the executors
> ---
>
> Key: MESOS-6494
> URL: https://issues.apache.org/jira/browse/MESOS-6494
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>
> The current executors and the executor libraries use a mix of `stout::flags` 
> and `os::getenv` to parse flags, leading to a lot of unnecessary and 
> sometimes duplicated code.
> This should be cleaned up, using only {{stout::flags}} to parse flags.
> Environment variables should be used for the flags that are common to ALL the 
> executors (listed in the Executor HTTP API doc).
> Command line parameters should be used for flags that apply only to 
> individual executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6494) Clean up the flags parsing in the executors

2016-10-27 Thread JIRA
Gastón Kleiman created MESOS-6494:
-

 Summary: Clean up the flags parsing in the executors
 Key: MESOS-6494
 URL: https://issues.apache.org/jira/browse/MESOS-6494
 Project: Mesos
  Issue Type: Improvement
Reporter: Gastón Kleiman
Assignee: Gastón Kleiman


The current executors and the executor libraries use a mix of `stout::flags` 
and `os::getenv` to parse flags, leading to a lot of unnecessary and sometimes 
duplicated code.

This should be cleaned up, using only {{stout::flags}} to parse flags.

Environment variables should be used for the flags that are common to ALL the 
executors (listed in the Executor HTTP API doc).

Command line parameters should be used for flags that apply only to individual 
executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6212) Validate the name format of mesos-managed docker containers

2016-10-27 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6212:
--
Shepherd: Timothy Chen  (was: Anand Mazumdar)

> Validate the name format of mesos-managed docker containers
> ---
>
> Key: MESOS-6212
> URL: https://issues.apache.org/jira/browse/MESOS-6212
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Affects Versions: 1.0.1
>Reporter: Marc Villacorta
>Assignee: Manuwela Kanade
> Fix For: 1.0.2
>
>
> Validate the name format of mesos-managed docker containers in order to avoid 
> false positives when looking for orphaned mesos tasks.
> Currently names such as _'mesos-master'_, _'mesos-agent'_ and _'mesos-dns'_ 
> are wrongly terminated when {{--docker_kill_orphans}} is set to true 
> (default).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6493) Add test cases for the HTTPS health checks.

2016-10-27 Thread haosdent (JIRA)
haosdent created MESOS-6493:
---

 Summary: Add test cases for the HTTPS health checks.
 Key: MESOS-6493
 URL: https://issues.apache.org/jira/browse/MESOS-6493
 Project: Mesos
  Issue Type: Task
  Components: tests
Reporter: haosdent
Assignee: haosdent
 Fix For: 1.2.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6492) Deprecate the existing `SSL_` env variables

2016-10-27 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-6492:
--
Target Version/s: 1.2.0

> Deprecate the existing `SSL_` env variables
> ---
>
> Key: MESOS-6492
> URL: https://issues.apache.org/jira/browse/MESOS-6492
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess
>Reporter: Gastón Kleiman
>
> `SSL_` env variables are deprecated by `LIBPROCES_SSL_`.
> Cleanup the code once the deprecation cycle is over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6327) Large docker images causes container launch failures: Too many levels of symbolic links

2016-10-27 Thread Rogier Dikkes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612156#comment-15612156
 ] 

Rogier Dikkes commented on MESOS-6327:
--

More information: 
Last week i created an docker image containing 21 layers which is based on 
ubuntu:16.04 containing a few packages, today i updated the image to remove a 
typo in it and the image increased 30MB in size (not layers). Now im running 
into the issue as above.

imagename  0.2.7   be78f88bb96937 minutes ago  418.3 MB
imagename  0.2.6   2022190ada2c7 days ago  391.9 MB

Some years ago the lxc community ran into this too, back then it was autofs 
causing issues. I have ensured autofs and automount were not running on the 
hosts.

> Large docker images causes container launch failures: Too many levels of 
> symbolic links
> ---
>
> Key: MESOS-6327
> URL: https://issues.apache.org/jira/browse/MESOS-6327
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.0.0, 1.0.1
> Environment: centos 7.2 (1511), ubuntu 14.04 (trusty). Replicated in 
> the Apache Aurora vagrant image
>Reporter: Rogier Dikkes
>Priority: Critical
>
> When deploying Mesos containers with large (6G+, 60+ layers) Docker images 
> the task crashes with the error: 
> Mesos agent logs: 
> E1007 08:40:12.954227  8117 slave.cpp:3976] Container 
> 'a1d759ae-5bc6-4c4e-ac03-717fbb8e5da4' for executor 
> 'thermos-www-data-devel-hello_docker_image-0-d42d2af6-6b44-4b2b-be95-e1ba93a6b365'
>  of framework df
> c91a86-84b9-4539-a7be-4ace7b7b44a1- failed to start: Collect failed: 
> Collect failed: Failed to copy layer: cp: cannot stat 
> ‘/var/lib/mesos/provisioner/containers/a1d759ae-5bc6-4c4e-ac03-717fbb8e5da4/b
> ackends/copy/rootfses/5f328f72-25d4-4a26-ac83-8d30bbc44e97/usr/share/zoneinfo/right/Asia/Urumqi’:
>  Too many levels of symbolic links
> ... (complete pastebin: http://pastebin.com/umZ4Q5d1 )
> How to replicate:
> Start the aurora vagrant image. Adjust the 
> /etc/mesos-slave/executor_registration_timeout to 5 mins. Adjust the file 
> /vagrant/examples/jobs/hello_docker_image.aurora to start a large Docker 
> image instead of the example. (you can use anldisr/jupyter:0.4 i created as a 
> test image, this is based upon the jupyter notebook stacks.). Create the job, 
> watch it fail after x number of minutes. 
> The mesos sandbox is empty. 
> Aurora errors i see: 
> 28 minutes ago - FAILED : Failed to launch container: Collect failed: Collect 
> failed: Failed to copy layer: cp: cannot stat 
> ‘/var/lib/mesos/provisioner/containers/93420a36-0e0c-4f04-b401-74c426c25686/backends/copy/rootfses/6e185a51-7174-4b0d-a305-42b634eb91bb/usr/share/zoneinfo/right/Asia/Urumqi’:
>  Too many levels of symbolic links cp: cannot stat ... 
> Too many levels of symbolic links ; Container destroyed while provisioning 
> images
> (complete pastebin: http://pastebin.com/uecHYD5J )
> To rule out the image i started this and more images as a normal Docker 
> container. This works without issues. 
> Mesos flags related configured: 
> -appc_store_dir 
> /tmp/mesos/images/appc
> -containerizers 
> docker,mesos
> -executor_registration_timeout 
> 5mins
> -image_providers 
> appc,docker
> -image_provisioner_backend 
> copy
> -isolation 
> filesystem/linux,docker/runtime
> Affected Mesos versions tested: 1.0.1 & 1.0.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6492) Deprecate the existing `SSL_` env variables

2016-10-27 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-6492:
--
Description: 
{{SSL_}} env variables are deprecated by {{LIBPROCES_SSL_}}.

Cleanup the code once the deprecation cycle is over.

  was:
`SSL_` env variables are deprecated by `LIBPROCES_SSL_`.

Cleanup the code once the deprecation cycle is over.


> Deprecate the existing `SSL_` env variables
> ---
>
> Key: MESOS-6492
> URL: https://issues.apache.org/jira/browse/MESOS-6492
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess
>Reporter: Gastón Kleiman
>
> {{SSL_}} env variables are deprecated by {{LIBPROCES_SSL_}}.
> Cleanup the code once the deprecation cycle is over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6327) Large docker images causes container launch failures: Too many levels of symbolic links

2016-10-27 Thread Rogier Dikkes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612156#comment-15612156
 ] 

Rogier Dikkes edited comment on MESOS-6327 at 10/27/16 3:08 PM:


More information: 
Last week i created an docker image containing 21 layers which is based on 
ubuntu:16.04 containing a few packages, today i updated the image to remove a 
typo in it and the image increased 30MB in size (not layers) i suspect because 
of package updates. Now im running into the issue as above.

imagename  0.2.7   be78f88bb96937 minutes ago  418.3 MB
imagename  0.2.6   2022190ada2c7 days ago  391.9 MB

Some years ago the lxc community ran into this too, back then it was autofs 
causing issues. I have ensured autofs and automount were not running on the 
hosts.


was (Author: a-nldisr):
More information: 
Last week i created an docker image containing 21 layers which is based on 
ubuntu:16.04 containing a few packages, today i updated the image to remove a 
typo in it and the image increased 30MB in size (not layers). Now im running 
into the issue as above.

imagename  0.2.7   be78f88bb96937 minutes ago  418.3 MB
imagename  0.2.6   2022190ada2c7 days ago  391.9 MB

Some years ago the lxc community ran into this too, back then it was autofs 
causing issues. I have ensured autofs and automount were not running on the 
hosts.

> Large docker images causes container launch failures: Too many levels of 
> symbolic links
> ---
>
> Key: MESOS-6327
> URL: https://issues.apache.org/jira/browse/MESOS-6327
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.0.0, 1.0.1
> Environment: centos 7.2 (1511), ubuntu 14.04 (trusty). Replicated in 
> the Apache Aurora vagrant image
>Reporter: Rogier Dikkes
>Priority: Critical
>
> When deploying Mesos containers with large (6G+, 60+ layers) Docker images 
> the task crashes with the error: 
> Mesos agent logs: 
> E1007 08:40:12.954227  8117 slave.cpp:3976] Container 
> 'a1d759ae-5bc6-4c4e-ac03-717fbb8e5da4' for executor 
> 'thermos-www-data-devel-hello_docker_image-0-d42d2af6-6b44-4b2b-be95-e1ba93a6b365'
>  of framework df
> c91a86-84b9-4539-a7be-4ace7b7b44a1- failed to start: Collect failed: 
> Collect failed: Failed to copy layer: cp: cannot stat 
> ‘/var/lib/mesos/provisioner/containers/a1d759ae-5bc6-4c4e-ac03-717fbb8e5da4/b
> ackends/copy/rootfses/5f328f72-25d4-4a26-ac83-8d30bbc44e97/usr/share/zoneinfo/right/Asia/Urumqi’:
>  Too many levels of symbolic links
> ... (complete pastebin: http://pastebin.com/umZ4Q5d1 )
> How to replicate:
> Start the aurora vagrant image. Adjust the 
> /etc/mesos-slave/executor_registration_timeout to 5 mins. Adjust the file 
> /vagrant/examples/jobs/hello_docker_image.aurora to start a large Docker 
> image instead of the example. (you can use anldisr/jupyter:0.4 i created as a 
> test image, this is based upon the jupyter notebook stacks.). Create the job, 
> watch it fail after x number of minutes. 
> The mesos sandbox is empty. 
> Aurora errors i see: 
> 28 minutes ago - FAILED : Failed to launch container: Collect failed: Collect 
> failed: Failed to copy layer: cp: cannot stat 
> ‘/var/lib/mesos/provisioner/containers/93420a36-0e0c-4f04-b401-74c426c25686/backends/copy/rootfses/6e185a51-7174-4b0d-a305-42b634eb91bb/usr/share/zoneinfo/right/Asia/Urumqi’:
>  Too many levels of symbolic links cp: cannot stat ... 
> Too many levels of symbolic links ; Container destroyed while provisioning 
> images
> (complete pastebin: http://pastebin.com/uecHYD5J )
> To rule out the image i started this and more images as a normal Docker 
> container. This works without issues. 
> Mesos flags related configured: 
> -appc_store_dir 
> /tmp/mesos/images/appc
> -containerizers 
> docker,mesos
> -executor_registration_timeout 
> 5mins
> -image_providers 
> appc,docker
> -image_provisioner_backend 
> copy
> -isolation 
> filesystem/linux,docker/runtime
> Affected Mesos versions tested: 1.0.1 & 1.0.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6492) Deprecate the existing `SSL_` env variables

2016-10-27 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-6492:
--
Description: 
{{SSL_}} env variables are deprecated by {{LIBPROCESS_SSL_}}.

Cleanup the code once the deprecation cycle is over.

  was:
{{SSL_}} env variables are deprecated by {{LIBPROCES_SSL_}}.

Cleanup the code once the deprecation cycle is over.


> Deprecate the existing `SSL_` env variables
> ---
>
> Key: MESOS-6492
> URL: https://issues.apache.org/jira/browse/MESOS-6492
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess
>Reporter: Gastón Kleiman
>
> {{SSL_}} env variables are deprecated by {{LIBPROCESS_SSL_}}.
> Cleanup the code once the deprecation cycle is over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6458) Add test to check fromString function of stout library

2016-10-27 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-6458:

Shepherd: Timothy Chen

> Add test to check fromString function of stout library
> --
>
> Key: MESOS-6458
> URL: https://issues.apache.org/jira/browse/MESOS-6458
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Affects Versions: 1.0.1
>Reporter: Manuwela Kanade
>Assignee: Manuwela Kanade
>Priority: Trivial
>
> For the 3rdparty stout library, there is a testcase for checking Malformed 
> UUID. 
> But this testcase does not have a positive test for the fromString function 
> to test if it returns correct UUID when passed a correctly formatted UUID 
> string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6492) Deprecate the existing `SSL_` env variables

2016-10-27 Thread JIRA
Gastón Kleiman created MESOS-6492:
-

 Summary: Deprecate the existing `SSL_` env variables
 Key: MESOS-6492
 URL: https://issues.apache.org/jira/browse/MESOS-6492
 Project: Mesos
  Issue Type: Task
  Components: libprocess
Reporter: Gastón Kleiman


`SSL_` env variables are deprecated by `LIBPROCES_SSL_`.

Cleanup the code once the deprecation cycle is over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6293) HealthCheckTest.HealthyTaskViaHTTPWithoutType fails on some distros.

2016-10-27 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612021#comment-15612021
 ] 

haosdent commented on MESOS-6293:
-

As [~alexr] investigation result, the test case would fail when we set 
{{LIBPROCESS_IP}} in the environment because running test cases because master 
would bind to {{LIBPROCESS_IP}} and didn't listen on {{127.0.0.1}}.

> HealthCheckTest.HealthyTaskViaHTTPWithoutType fails on some distros.
> 
>
> Key: MESOS-6293
> URL: https://issues.apache.org/jira/browse/MESOS-6293
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: health-check, mesosphere
>
> I see consistent failures of this test in the internal CI in *some* distros, 
> specifically CentOS 6, Ubuntu 14, 15, 16. The source of the health check 
> failure is always the same: {{curl}} cannot connect to the target:
> {noformat}
> Received task health update, healthy: false
> W0929 17:22:05.270992  2730 health_checker.cpp:204] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) couldn't connect to host
> I0929 17:22:05.273634 26850 slave.cpp:3609] Handling status update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- from executor(1)@172.30.2.20:58660
> I0929 17:22:05.274178 26844 status_update_manager.cpp:323] Received status 
> update TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> I0929 17:22:05.274226 26844 status_update_manager.cpp:377] Forwarding update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to the agent
> I0929 17:22:05.274314 26845 slave.cpp:4026] Forwarding the update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to master@172.30.2.20:38955
> I0929 17:22:05.274415 26845 slave.cpp:3920] Status update manager 
> successfully handled status update TASK_RUNNING (UUID: 
> f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> I0929 17:22:05.274436 26845 slave.cpp:3936] Sending acknowledgement for 
> status update TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for 
> task aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of 
> framework 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to 
> executor(1)@172.30.2.20:58660
> I0929 17:22:05.274534 26849 master.cpp:5661] Status update TASK_RUNNING 
> (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- from agent 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-S0 at slave(77)@172.30.2.20:38955 
> (ip-172-30-2-20.mesosphere.io)
> ../../src/tests/health_check_tests.cpp:1398: Failure
> I0929 17:22:05.274567 26849 master.cpp:5723] Forwarding status update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> Value of: statusHealth.get().healthy()
>   Actual: false
>   Expected: true
> I0929 17:22:05.274636 26849 master.cpp:7560] Updating the state of task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- (latest state: TASK_RUNNING, status 
> update state: TASK_RUNNING)
> I0929 17:22:05.274829 26844 sched.cpp:1025] Scheduler::statusUpdate took 
> 43297ns
> Received SHUTDOWN event
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6293) HealthCheckTest.HealthyTaskViaHTTPWithoutType fails on some distros.

2016-10-27 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6293:

Assignee: Alexander Rukletsov  (was: haosdent)

> HealthCheckTest.HealthyTaskViaHTTPWithoutType fails on some distros.
> 
>
> Key: MESOS-6293
> URL: https://issues.apache.org/jira/browse/MESOS-6293
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: health-check, mesosphere
>
> I see consistent failures of this test in the internal CI in *some* distros, 
> specifically CentOS 6, Ubuntu 14, 15, 16. The source of the health check 
> failure is always the same: {{curl}} cannot connect to the target:
> {noformat}
> Received task health update, healthy: false
> W0929 17:22:05.270992  2730 health_checker.cpp:204] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) couldn't connect to host
> I0929 17:22:05.273634 26850 slave.cpp:3609] Handling status update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- from executor(1)@172.30.2.20:58660
> I0929 17:22:05.274178 26844 status_update_manager.cpp:323] Received status 
> update TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> I0929 17:22:05.274226 26844 status_update_manager.cpp:377] Forwarding update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to the agent
> I0929 17:22:05.274314 26845 slave.cpp:4026] Forwarding the update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to master@172.30.2.20:38955
> I0929 17:22:05.274415 26845 slave.cpp:3920] Status update manager 
> successfully handled status update TASK_RUNNING (UUID: 
> f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> I0929 17:22:05.274436 26845 slave.cpp:3936] Sending acknowledgement for 
> status update TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for 
> task aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of 
> framework 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to 
> executor(1)@172.30.2.20:58660
> I0929 17:22:05.274534 26849 master.cpp:5661] Status update TASK_RUNNING 
> (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- from agent 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-S0 at slave(77)@172.30.2.20:38955 
> (ip-172-30-2-20.mesosphere.io)
> ../../src/tests/health_check_tests.cpp:1398: Failure
> I0929 17:22:05.274567 26849 master.cpp:5723] Forwarding status update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> Value of: statusHealth.get().healthy()
>   Actual: false
>   Expected: true
> I0929 17:22:05.274636 26849 master.cpp:7560] Updating the state of task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- (latest state: TASK_RUNNING, status 
> update state: TASK_RUNNING)
> I0929 17:22:05.274829 26844 sched.cpp:1025] Scheduler::statusUpdate took 
> 43297ns
> Received SHUTDOWN event
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6293) HealthCheckTest.HealthyTaskViaHTTPWithoutType fails on some distros.

2016-10-27 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6293:

Shepherd:   (was: Alexander Rukletsov)

> HealthCheckTest.HealthyTaskViaHTTPWithoutType fails on some distros.
> 
>
> Key: MESOS-6293
> URL: https://issues.apache.org/jira/browse/MESOS-6293
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: health-check, mesosphere
>
> I see consistent failures of this test in the internal CI in *some* distros, 
> specifically CentOS 6, Ubuntu 14, 15, 16. The source of the health check 
> failure is always the same: {{curl}} cannot connect to the target:
> {noformat}
> Received task health update, healthy: false
> W0929 17:22:05.270992  2730 health_checker.cpp:204] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) couldn't connect to host
> I0929 17:22:05.273634 26850 slave.cpp:3609] Handling status update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- from executor(1)@172.30.2.20:58660
> I0929 17:22:05.274178 26844 status_update_manager.cpp:323] Received status 
> update TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> I0929 17:22:05.274226 26844 status_update_manager.cpp:377] Forwarding update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to the agent
> I0929 17:22:05.274314 26845 slave.cpp:4026] Forwarding the update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to master@172.30.2.20:38955
> I0929 17:22:05.274415 26845 slave.cpp:3920] Status update manager 
> successfully handled status update TASK_RUNNING (UUID: 
> f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> I0929 17:22:05.274436 26845 slave.cpp:3936] Sending acknowledgement for 
> status update TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for 
> task aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of 
> framework 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to 
> executor(1)@172.30.2.20:58660
> I0929 17:22:05.274534 26849 master.cpp:5661] Status update TASK_RUNNING 
> (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- from agent 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-S0 at slave(77)@172.30.2.20:38955 
> (ip-172-30-2-20.mesosphere.io)
> ../../src/tests/health_check_tests.cpp:1398: Failure
> I0929 17:22:05.274567 26849 master.cpp:5723] Forwarding status update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> Value of: statusHealth.get().healthy()
>   Actual: false
>   Expected: true
> I0929 17:22:05.274636 26849 master.cpp:7560] Updating the state of task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- (latest state: TASK_RUNNING, status 
> update state: TASK_RUNNING)
> I0929 17:22:05.274829 26844 sched.cpp:1025] Scheduler::statusUpdate took 
> 43297ns
> Received SHUTDOWN event
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6293) HealthCheckTest.HealthyTaskViaHTTPWithoutType fails on some distros.

2016-10-27 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612004#comment-15612004
 ] 

Alexander Rukletsov commented on MESOS-6293:


https://reviews.apache.org/r/53226/

> HealthCheckTest.HealthyTaskViaHTTPWithoutType fails on some distros.
> 
>
> Key: MESOS-6293
> URL: https://issues.apache.org/jira/browse/MESOS-6293
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: haosdent
>  Labels: health-check, mesosphere
>
> I see consistent failures of this test in the internal CI in *some* distros, 
> specifically CentOS 6, Ubuntu 14, 15, 16. The source of the health check 
> failure is always the same: {{curl}} cannot connect to the target:
> {noformat}
> Received task health update, healthy: false
> W0929 17:22:05.270992  2730 health_checker.cpp:204] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) couldn't connect to host
> I0929 17:22:05.273634 26850 slave.cpp:3609] Handling status update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- from executor(1)@172.30.2.20:58660
> I0929 17:22:05.274178 26844 status_update_manager.cpp:323] Received status 
> update TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> I0929 17:22:05.274226 26844 status_update_manager.cpp:377] Forwarding update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to the agent
> I0929 17:22:05.274314 26845 slave.cpp:4026] Forwarding the update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to master@172.30.2.20:38955
> I0929 17:22:05.274415 26845 slave.cpp:3920] Status update manager 
> successfully handled status update TASK_RUNNING (UUID: 
> f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> I0929 17:22:05.274436 26845 slave.cpp:3936] Sending acknowledgement for 
> status update TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for 
> task aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of 
> framework 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to 
> executor(1)@172.30.2.20:58660
> I0929 17:22:05.274534 26849 master.cpp:5661] Status update TASK_RUNNING 
> (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- from agent 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-S0 at slave(77)@172.30.2.20:38955 
> (ip-172-30-2-20.mesosphere.io)
> ../../src/tests/health_check_tests.cpp:1398: Failure
> I0929 17:22:05.274567 26849 master.cpp:5723] Forwarding status update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> Value of: statusHealth.get().healthy()
>   Actual: false
>   Expected: true
> I0929 17:22:05.274636 26849 master.cpp:7560] Updating the state of task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- (latest state: TASK_RUNNING, status 
> update state: TASK_RUNNING)
> I0929 17:22:05.274829 26844 sched.cpp:1025] Scheduler::statusUpdate took 
> 43297ns
> Received SHUTDOWN event
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6279) Add test cases for the TCP health check.

2016-10-27 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6279:
---
Summary: Add test cases for the TCP health check.  (was: Add test cases for 
the TCP health check)

> Add test cases for the TCP health check.
> 
>
> Key: MESOS-6279
> URL: https://issues.apache.org/jira/browse/MESOS-6279
> Project: Mesos
>  Issue Type: Task
>  Components: tests
>Reporter: haosdent
>Assignee: haosdent
>  Labels: health-check, mesosphere, test
> Fix For: 1.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6446) WebUI redirect doesn't work with stats from /metric/snapshot

2016-10-27 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15611795#comment-15611795
 ] 

haosdent edited comment on MESOS-6446 at 10/27/16 1:41 PM:
---

Yes, [~vinodkone] is reviewing patches.


was (Author: haosd...@gmail.com):
Yes, [~vinodkone] are reviewing patches.

> WebUI redirect doesn't work with stats from /metric/snapshot
> 
>
> Key: MESOS-6446
> URL: https://issues.apache.org/jira/browse/MESOS-6446
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.0.0
>Reporter: Yan Xu
>Assignee: haosdent
>Priority: Blocker
> Attachments: Screen Shot 2016-10-21 at 12.04.23 PM.png, 
> webui_metrics.gif
>
>
> After Mesos 1.0, the webUI redirect is hidden from the users so you can go to 
> any of the master and the webUI is populated with state.json from the leading 
> master. 
> This doesn't include stats from /metric/snapshot though as it is not 
> redirected. The user ends up seeing some fields with empty values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6484) Memory leak in `Future::after()`

2016-10-27 Thread Alexander Rojas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rojas updated MESOS-6484:
---
Description: 
The problem arises when one tries to associate an {{after()}} call to copied 
futures. The following test case is enough to reproduce the issue:

{code}
TEST(FutureTest, After3)
{
  auto policy = std::make_shared(0);

  {
auto generator = []() {
  return Future();
};

Future future = generator()
  .after(Milliseconds(1),
[policy](const Future&) {
   return Nothing();
});

AWAIT_READY(future);
  }

  EXPECT_EQ(1, policy.use_count());
}
{code}

In the test, one would expect that there is only one active reference to 
{{policy}}, therefore the expectation {{EXPECT_EQ(1, policy.use_count())}}. 
However, if after is triggered more than once, each extra call adds one 
undeleted reference to {{policy}}.

  was:
The problem arises when one tries to associate an {{after()}} call to copied 
futures. The following test case is enough to reproduce the issue:

{code}
class Policy
{
public:
  virtual Try timeout() = 0;
  virtual Duration totalTimeout() = 0;

  virtual ~Policy() {}
};

class MockPolicy : public Policy
{
public:
  virtual ~MockPolicy() {}

  MOCK_METHOD0(timeout, Try());
  MOCK_METHOD0(totalTimeout, Duration());
};

template 
process::Future retry(
const std::function()>& action,
const std::shared_ptr& policy)
{
  CHECK(policy != nullptr);

  Try timeout = policy->timeout();
  if (timeout.isError()) {
return Future::failed(timeout.error());
  }

  return action()
.after(timeout.get(), [action, policy](const Future&) {
  return retry(action, policy);
});
}

TEST(FutureTest, Retry)
{
  auto policy = std::make_shared();

  EXPECT_CALL(*policy, timeout())
  .WillRepeatedly(Return(Milliseconds(1)));

  unsigned callCount = 0;
  auto future = retry([&callCount]() -> Future {
  ++callCount;
  if (callCount < 4) {
return Future();
  }
  return Nothing();
},
policy);

  AWAIT_READY(future);
  EXPECT_EQ(1, policy.use_count());
{code}

In the test, one would expect that there is only one active reference to 
{{policy}}, therefore the expectation {{EXPECT_EQ(1, policy.use_count())}}. 
However, if after is triggered more than once, each extra call adds one 
undeleted reference to {{policy}}.


> Memory leak in `Future::after()`
> ---
>
> Key: MESOS-6484
> URL: https://issues.apache.org/jira/browse/MESOS-6484
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 1.1.0
>Reporter: Alexander Rojas
>  Labels: libprocess, mesosphere
>
> The problem arises when one tries to associate an {{after()}} call to copied 
> futures. The following test case is enough to reproduce the issue:
> {code}
> TEST(FutureTest, After3)
> {
>   auto policy = std::make_shared(0);
>   {
> auto generator = []() {
>   return Future();
> };
> Future future = generator()
>   .after(Milliseconds(1),
> [policy](const Future&) {
>return Nothing();
> });
> AWAIT_READY(future);
>   }
>   EXPECT_EQ(1, policy.use_count());
> }
> {code}
> In the test, one would expect that there is only one active reference to 
> {{policy}}, therefore the expectation {{EXPECT_EQ(1, policy.use_count())}}. 
> However, if after is triggered more than once, each extra call adds one 
> undeleted reference to {{policy}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6446) WebUI redirect doesn't work with stats from /metric/snapshot

2016-10-27 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15611795#comment-15611795
 ] 

haosdent commented on MESOS-6446:
-

Yes, [~vinodkone] are reviewing patches.

> WebUI redirect doesn't work with stats from /metric/snapshot
> 
>
> Key: MESOS-6446
> URL: https://issues.apache.org/jira/browse/MESOS-6446
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.0.0
>Reporter: Yan Xu
>Assignee: haosdent
>Priority: Blocker
> Attachments: Screen Shot 2016-10-21 at 12.04.23 PM.png, 
> webui_metrics.gif
>
>
> After Mesos 1.0, the webUI redirect is hidden from the users so you can go to 
> any of the master and the webUI is populated with state.json from the leading 
> master. 
> This doesn't include stats from /metric/snapshot though as it is not 
> redirected. The user ends up seeing some fields with empty values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6278) Add test cases for the HTTP health checks.

2016-10-27 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6278:
---
Summary: Add test cases for the HTTP health checks.  (was: Add test cases 
for the HTTP health checks)

> Add test cases for the HTTP health checks.
> --
>
> Key: MESOS-6278
> URL: https://issues.apache.org/jira/browse/MESOS-6278
> Project: Mesos
>  Issue Type: Task
>  Components: tests
>Reporter: haosdent
>Assignee: haosdent
>  Labels: health-check, mesosphere, test
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6446) WebUI redirect doesn't work with stats from /metric/snapshot

2016-10-27 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15611654#comment-15611654
 ] 

Till Toenshoff commented on MESOS-6446:
---

[~vinodkone] are you shepherding this?

> WebUI redirect doesn't work with stats from /metric/snapshot
> 
>
> Key: MESOS-6446
> URL: https://issues.apache.org/jira/browse/MESOS-6446
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.0.0
>Reporter: Yan Xu
>Assignee: haosdent
>Priority: Blocker
> Attachments: Screen Shot 2016-10-21 at 12.04.23 PM.png, 
> webui_metrics.gif
>
>
> After Mesos 1.0, the webUI redirect is hidden from the users so you can go to 
> any of the master and the webUI is populated with state.json from the leading 
> master. 
> This doesn't include stats from /metric/snapshot though as it is not 
> redirected. The user ends up seeing some fields with empty values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6491) Mesos dashboard: Allow to download zip file of task sandbox

2016-10-27 Thread JIRA
Mischa Krüger created MESOS-6491:


 Summary: Mesos dashboard: Allow to download zip file of task 
sandbox
 Key: MESOS-6491
 URL: https://issues.apache.org/jira/browse/MESOS-6491
 Project: Mesos
  Issue Type: Wish
  Components: webui
Reporter: Mischa Krüger
Priority: Minor


Mesos dashboard should have a little "Download sandbox as .zip" button or 
similar which allows to download the complete sandbox with a single click. 
Makes sharing of sandboxes way easier, as there's no need to click on every 
file of the sandbox and download every file separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6484) Memory leak in `Future::after()`

2016-10-27 Thread Alexander Rojas (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15611284#comment-15611284
 ] 

Alexander Rojas commented on MESOS-6484:


I've been looking into this for a couple of days now. I have narrow it down to 
[this 
snippet|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/include/process/future.hpp#L1411-L1415]:

{code:title=future.hpp}
  Timer timer = Clock::timer(
  duration,
  lambda::bind(&internal::expired, f, latch, promise, *this));

  onAny(lambda::bind(&internal::after, latch, promise, timer, lambda::_1));
{code}

If the {{timer}} expires without the future being set, a copy of the {{timer}} 
is kept somewhere. However if {{Clock::cancel(timer)}} is called (because the 
future is set) the timer is properly destroyed. One copy is called by every 
expired timer. I just haven't found who owns that copy of the timer.

> Memory leak in `Future::after()`
> ---
>
> Key: MESOS-6484
> URL: https://issues.apache.org/jira/browse/MESOS-6484
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 1.1.0
>Reporter: Alexander Rojas
>  Labels: libprocess, mesosphere
>
> The problem arises when one tries to associate an {{after()}} call to copied 
> futures. The following test case is enough to reproduce the issue:
> {code}
> class Policy
> {
> public:
>   virtual Try timeout() = 0;
>   virtual Duration totalTimeout() = 0;
>   virtual ~Policy() {}
> };
> class MockPolicy : public Policy
> {
> public:
>   virtual ~MockPolicy() {}
>   MOCK_METHOD0(timeout, Try());
>   MOCK_METHOD0(totalTimeout, Duration());
> };
> template 
> process::Future retry(
> const std::function()>& action,
> const std::shared_ptr& policy)
> {
>   CHECK(policy != nullptr);
>   Try timeout = policy->timeout();
>   if (timeout.isError()) {
> return Future::failed(timeout.error());
>   }
>   return action()
> .after(timeout.get(), [action, policy](const Future&) {
>   return retry(action, policy);
> });
> }
> TEST(FutureTest, Retry)
> {
>   auto policy = std::make_shared();
>   EXPECT_CALL(*policy, timeout())
>   .WillRepeatedly(Return(Milliseconds(1)));
>   unsigned callCount = 0;
>   auto future = retry([&callCount]() -> Future {
>   ++callCount;
>   if (callCount < 4) {
> return Future();
>   }
>   return Nothing();
> },
> policy);
>   AWAIT_READY(future);
>   EXPECT_EQ(1, policy.use_count());
> {code}
> In the test, one would expect that there is only one active reference to 
> {{policy}}, therefore the expectation {{EXPECT_EQ(1, policy.use_count())}}. 
> However, if after is triggered more than once, each extra call adds one 
> undeleted reference to {{policy}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6489) Better support for containers that want to manage their own cgroup.

2016-10-27 Thread Anindya Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15611011#comment-15611011
 ] 

Anindya Sinha commented on MESOS-6489:
--

Jotting down some thoughts based on our previous conversation:

In {{Future destroy(const string& hierarchy, const string& cgroup)}}

1. Extract all cgroups (including sub cgroups) in bottom up fashion via 
{{cgroups::get(hierarchy, cgroup)}}

2. If freezer is available:
2a. We use {{TasksKiller}} to freeze cgroups, {{SIGKILL}} all tasks, and thaw 
cgroups (may be in top down fashion). However, we add a new attribute to this 
class {{bool ignoreMissingCgroup}}. If that is set, we ignore any error for 
cgroups that do not exist in {{TasksKiller::finished()}}.
2b. At this point, we remove the cgroups in bottom up fashion incase there is 
no error reported in {{TasksKiller}}. We bail out as an error if there is any 
failure in removal of cgroups. Similar to step #2a, we ignore errors for 
cgroups that do not exist.

3. If freezer is unavailable, we remove the cgroups starting from bottom up 
using {{cgroups::remove(hierarchy, cgroup)}}. If remove fails due to 
non-presence of the cgroup, we ignore that failure,

We will have the "ignore error due to missing cgroup" in 2 places, viz. 
{{TasksKiller::finished()}} and in {{cgroups::destroy}}

> Better support for containers that want to manage their own cgroup.
> ---
>
> Key: MESOS-6489
> URL: https://issues.apache.org/jira/browse/MESOS-6489
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>
> Some containers want to manage their cgroup by sub-dividing the cgroup that 
> Mesos allocates to them into multiple sub-cgroups and put subprocess into the 
> corresponding sub-cgroups.
> For instance, someone wants to run Docker daemon in a Mesos container. Docker 
> daemon will manage the cgroup assigned to it by Mesos (with the help , for 
> example, cgroups namespace).
> Problems arise during the teardown of the container because two entities 
> might be manipulating the same cgroup simultaneously. For example, the Mesos 
> cgroups::destroy might fail if the task running inside is trying to delete 
> the same nested cgroup at the same time.
> To support that case, we should consider kill all the processes in the Mesos 
> cgroup first, making sure that no one will be creating sub-cgroups and moving 
> new processes into sub-cgroups. And then, destroy the cgroups recursively.
> And we need freezer because we want to make sure all processes are stopped 
> while we are sending kill signals to avoid TOCTTOU race problem. I think it 
> makes more sense to freezer the cgroups (and sub-cgroups) from top down 
> (rather than bottom up because typically, processes in the parent cgroup 
> manipulate sub-cgroups).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)