[jira] [Updated] (MESOS-7957) The REGISTER_FRAMEWORK_WITH_ROLE does not use in source code

2017-09-11 Thread jackyoh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jackyoh updated MESOS-7957:
---
Shepherd: Adam B

> The REGISTER_FRAMEWORK_WITH_ROLE does not use in source code
> 
>
> Key: MESOS-7957
> URL: https://issues.apache.org/jira/browse/MESOS-7957
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Reporter: jackyoh
>Priority: Trivial
>
> Mesos test code has the REGISTER_FRAMEWORK_WITH_ROLE action in 
> src/test/authorization_tests.cpp, but the source code does not
> use the REGISTER_FRAMEWORK_WITH_ROLE action.
> Can I remove the REGISTER_FRAMEWORK_WITH_ROLE action in 
> src/tests/authorization_tests.cpp?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (MESOS-7964) Heavy-duty GC makes the agent unresponsive

2017-09-11 Thread Chun-Hung Hsiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun-Hung Hsiao updated MESOS-7964:
---
Comment: was deleted

(was: Patch for review: https://reviews.apache.org/r/62230/)

> Heavy-duty GC makes the agent unresponsive
> --
>
> Key: MESOS-7964
> URL: https://issues.apache.org/jira/browse/MESOS-7964
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
> Fix For: 1.4.1
>
>
> An agent is observed to performe heavy-duty GC every half an hour:
> {noformat}
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900282 16054 slave.cpp:5920] Current disk 
> usage 93.61%. Max allowed age: 0ns
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900476 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99022105972148days
> ...
> Sep 07 18:22:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:22:08.173645 16050 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__f33065c9-eb42-44a7-9013-25bafc306bd5'
> ...
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195329 16051 slave.cpp:5920] Current disk 
> usage 90.85%. Max allowed age: 0ns
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195503 16051 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028708946667days
> ...
> Sep 07 18:49:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:49:01.253906 16049 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__014b451a-30de-41ee-b0b1-3733c790382c/runs/c5b922e8-eee0-4793-8637-7abbd7f8507e'
> ...
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291092 16048 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291285 16048 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028598086815days
> ...
> Sep 07 19:14:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: W0907 19:14:50.737226 16050 gc.cpp:174] Failed to delete 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__4139bf2e-e33b-4743-8527-f8f50ac49280/runs/b1991e28-7ff8-476f-8122-1a483e431ff2':
>  No such file or directory
> ...
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758191 16052 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758872 16047 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028057238519days
> ...
> Sep 07 19:39:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:39:43.081485 16052 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__d89dce1f-609b-4cf8-957a-5ba198be7828'
> ...
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150535 16048 slave.cpp:5920] Current disk 
> usage 94.56%. Max allowed age: 0ns
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150869 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.98959316198222days
> ...
> Sep 07 20:06:16 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 20:06:16.251552 16051 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0259/executors/data__45283e7d-9a5e-4d4b-9901-b7f1e096cd54/runs/5cfc5e3e-3975-41aa-846b-c125eb529fbe'
> {noformat}
> Each GC activity took 5+ minutes. During the period, the agent became 
> unresponsive, the health check timed out, and no endpoint responded as well. 
> When a disk-usage GC is trigged, around 300 pruning actors would be generated 
> (https://github.com/apache/mesos/blob/master/src/slave/gc.cpp#L229). My 
> hypothesis i

[jira] [Comment Edited] (MESOS-6428) Mesos containerizer helper function signalSafeWriteStatus is not AS-Safe

2017-09-11 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162355#comment-16162355
 ] 

Till Toenshoff edited comment on MESOS-6428 at 9/12/17 1:51 AM:


[~jamespeach] we commonly remove the duplication from the RR subject / 
description from our commit messages before pushing.

So instead of  
{noformat}
commit 905c758782f8587276ee207261277517a34482a2
Author: Andrei Budnik 
Date:   Wed Sep 6 22:02:29 2017 -0700

Used SAFE_EXIT macro in `CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write`.

Used SAFE_EXIT macro in `CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write`.

Review: https://reviews.apache.org/r/61801/
{noformat}

it would be 

{noformat}
commit 905c758782f8587276ee207261277517a34482a2
Author: Andrei Budnik 
Date:   Wed Sep 6 22:02:29 2017 -0700

Used SAFE_EXIT macro in `CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write`.

Review: https://reviews.apache.org/r/61801/
{noformat}



was (Author: tillt):
[~jamespeach] we commonly remove the duplication from the RR subject / 
description from our commit messages before pushing.

So instead of  
```
commit 905c758782f8587276ee207261277517a34482a2
Author: Andrei Budnik 
Date:   Wed Sep 6 22:02:29 2017 -0700

Used SAFE_EXIT macro in `CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write`.

Used SAFE_EXIT macro in `CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write`.

Review: https://reviews.apache.org/r/61801/
```

it would be 

```
commit 905c758782f8587276ee207261277517a34482a2
Author: Andrei Budnik 
Date:   Wed Sep 6 22:02:29 2017 -0700

Used SAFE_EXIT macro in `CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write`.

Review: https://reviews.apache.org/r/61801/
```

> Mesos containerizer helper function signalSafeWriteStatus is not AS-Safe
> 
>
> Key: MESOS-6428
> URL: https://issues.apache.org/jira/browse/MESOS-6428
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.1.0
>Reporter: Benjamin Bannier
>Assignee: Andrei Budnik
>  Labels: newbie, tech-debt
> Fix For: 1.5.0
>
>
> In {{src/slave/containerizer/mesos/launch.cpp}} a helper function 
> {{signalSafeWriteStatus}} is defined. Its name seems to suggest that this 
> function is safe to call in e.g., signal handlers, and it is used in this 
> file's {{signalHandler}} for exactly that purpose.
> Currently this function is not AS-Safe since it e.g., allocates memory via 
> construction of {{string}} instances, and might destructively modify 
> {{errno}}.
> We should clean up this function to be in fact AS-Safe.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6428) Mesos containerizer helper function signalSafeWriteStatus is not AS-Safe

2017-09-11 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162355#comment-16162355
 ] 

Till Toenshoff commented on MESOS-6428:
---

[~jamespeach] we commonly remove the duplication from the RR subject / 
description from our commit messages before pushing.

So instead of  
```
commit 905c758782f8587276ee207261277517a34482a2
Author: Andrei Budnik 
Date:   Wed Sep 6 22:02:29 2017 -0700

Used SAFE_EXIT macro in `CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write`.

Used SAFE_EXIT macro in `CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write`.

Review: https://reviews.apache.org/r/61801/
```

it would be 

```
commit 905c758782f8587276ee207261277517a34482a2
Author: Andrei Budnik 
Date:   Wed Sep 6 22:02:29 2017 -0700

Used SAFE_EXIT macro in `CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write`.

Review: https://reviews.apache.org/r/61801/
```

> Mesos containerizer helper function signalSafeWriteStatus is not AS-Safe
> 
>
> Key: MESOS-6428
> URL: https://issues.apache.org/jira/browse/MESOS-6428
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.1.0
>Reporter: Benjamin Bannier
>Assignee: Andrei Budnik
>  Labels: newbie, tech-debt
> Fix For: 1.5.0
>
>
> In {{src/slave/containerizer/mesos/launch.cpp}} a helper function 
> {{signalSafeWriteStatus}} is defined. Its name seems to suggest that this 
> function is safe to call in e.g., signal handlers, and it is used in this 
> file's {{signalHandler}} for exactly that purpose.
> Currently this function is not AS-Safe since it e.g., allocates memory via 
> construction of {{string}} instances, and might destructively modify 
> {{errno}}.
> We should clean up this function to be in fact AS-Safe.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-2728) Introduce concept of cluster wide resources.

2017-09-11 Thread Huadong Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162234#comment-16162234
 ] 

Huadong Liu commented on MESOS-2728:


[~jieyu], [~vinodkone] Is there an update on this?

> Introduce concept of cluster wide resources.
> 
>
> Key: MESOS-2728
> URL: https://issues.apache.org/jira/browse/MESOS-2728
> Project: Mesos
>  Issue Type: Epic
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: external-volumes, mesosphere
>
> There are resources which are not provided by a single node. Consider for 
> example a external Network Bandwidth of a cluster. Being a limited resource 
> it makes sense for Mesos to manage it but still it is not a resource being 
> offered by a single node. A cluster-wide resource is still consumed by a 
> task, and when that task completes, the resources are then available to be 
> allocated to another framework/task.
> Use Cases:
> 1. Network Bandwidth
> 2. IP Addresses
> 3. Global Service Ports
> 4. Distributed File System Storage
> 5. Software Licences
> 6. SAN Volumes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-5482) mesos/marathon task stuck in staging after slave reboot

2017-09-11 Thread Mao Geng (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162232#comment-16162232
 ] 

Mao Geng commented on MESOS-5482:
-

[~chhsia0] the problem happened on agent lost connection with master and 
re-registered, no one was really shutting down marathon. MESOS-7215 look like 
the root cause. 
When agent re-registered, it was shutting down all executors of non 
partition-aware frameworks, including the marathon task. Meanwhile marathon 
tried to lunch a new task on the agent, and the agent ignored running the task 
as it thought the framework was shutting down, hence the task got stuck in the 
"staging" stage. Then marathon tried to kill the task as the task is overdue on 
deployment, which got ignored by the agent too. 
Restarting the agent resolves this issue though.

> mesos/marathon task stuck in staging after slave reboot
> ---
>
> Key: MESOS-5482
> URL: https://issues.apache.org/jira/browse/MESOS-5482
> Project: Mesos
>  Issue Type: Bug
>Reporter: lutful karim
>  Labels: tech-debt
> Attachments: marathon-mesos-masters_after-reboot.log, 
> mesos-masters_mesos.log, mesos_slaves_after_reboot.log, 
> tasks_running_before_rebooot.marathon
>
>
> The main idea of mesos/marathon is to sleep well, but after node reboot mesos 
> task gets stuck in staging for about 4 hours.
> To reproduce the issue: 
> - setup a mesos cluster in HA mode with systemd enabled mesos-master and 
> mesos-slave service.
> - run docker registry (https://hub.docker.com/_/registry/ ) with mesos 
> constraint (hostname:LIKE:mesos-slave-1) in one node. Reboot the node and 
> notice that task getting stuck in staging.
> Possible workaround: service mesos-slave restart fixes the issue.
> OS: centos 7.2
> mesos version: 0.28.1
> marathon: 1.1.1
> zookeeper: 3.4.8
> docker: 1.9.1 dockerAPIversion: 1.21
> error message:
> May 30 08:38:24 euca-10-254-237-140 mesos-slave[832]: W0530 08:38:24.120013   
> 909 slave.cpp:2018] Ignoring kill task 
> docker-registry.066fb448-2628-11e6-bedd-d00d0ef81dc3 because the executor 
> 'docker-registry.066fb448-2628-11e6-bedd-d00d0ef81dc3' of framework 
> 8517fcb7-f2d0-47ad-ae02-837570bef929- is terminating/terminated



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7964) Heavy-duty GC makes the agent unresponsive

2017-09-11 Thread Chun-Hung Hsiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun-Hung Hsiao updated MESOS-7964:
---
Component/s: agent

> Heavy-duty GC makes the agent unresponsive
> --
>
> Key: MESOS-7964
> URL: https://issues.apache.org/jira/browse/MESOS-7964
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
> Fix For: 1.4.1
>
>
> An agent is observed to performe heavy-duty GC every half an hour:
> {noformat}
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900282 16054 slave.cpp:5920] Current disk 
> usage 93.61%. Max allowed age: 0ns
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900476 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99022105972148days
> ...
> Sep 07 18:22:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:22:08.173645 16050 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__f33065c9-eb42-44a7-9013-25bafc306bd5'
> ...
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195329 16051 slave.cpp:5920] Current disk 
> usage 90.85%. Max allowed age: 0ns
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195503 16051 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028708946667days
> ...
> Sep 07 18:49:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:49:01.253906 16049 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__014b451a-30de-41ee-b0b1-3733c790382c/runs/c5b922e8-eee0-4793-8637-7abbd7f8507e'
> ...
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291092 16048 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291285 16048 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028598086815days
> ...
> Sep 07 19:14:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: W0907 19:14:50.737226 16050 gc.cpp:174] Failed to delete 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__4139bf2e-e33b-4743-8527-f8f50ac49280/runs/b1991e28-7ff8-476f-8122-1a483e431ff2':
>  No such file or directory
> ...
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758191 16052 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758872 16047 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028057238519days
> ...
> Sep 07 19:39:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:39:43.081485 16052 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__d89dce1f-609b-4cf8-957a-5ba198be7828'
> ...
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150535 16048 slave.cpp:5920] Current disk 
> usage 94.56%. Max allowed age: 0ns
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150869 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.98959316198222days
> ...
> Sep 07 20:06:16 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 20:06:16.251552 16051 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0259/executors/data__45283e7d-9a5e-4d4b-9901-b7f1e096cd54/runs/5cfc5e3e-3975-41aa-846b-c125eb529fbe'
> {noformat}
> Each GC activity took 5+ minutes. During the period, the agent became 
> unresponsive, the health check timed out, and no endpoint responded as well. 
> When a disk-usage GC is trigged, around 300 pruning actors would be generated 
> (https://github.com/apache/mesos/blob/master/src/slave/gc.cpp#L229). My 
> hypothesis is that these actors would used all of the worker threads, and 
>

[jira] [Updated] (MESOS-7964) Heavy-duty GC makes the agent unresponsive

2017-09-11 Thread Chun-Hung Hsiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun-Hung Hsiao updated MESOS-7964:
---
Sprint: Mesosphere Sprint 63

> Heavy-duty GC makes the agent unresponsive
> --
>
> Key: MESOS-7964
> URL: https://issues.apache.org/jira/browse/MESOS-7964
> Project: Mesos
>  Issue Type: Bug
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
> Fix For: 1.4.1
>
>
> An agent is observed to performe heavy-duty GC every half an hour:
> {noformat}
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900282 16054 slave.cpp:5920] Current disk 
> usage 93.61%. Max allowed age: 0ns
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900476 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99022105972148days
> ...
> Sep 07 18:22:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:22:08.173645 16050 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__f33065c9-eb42-44a7-9013-25bafc306bd5'
> ...
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195329 16051 slave.cpp:5920] Current disk 
> usage 90.85%. Max allowed age: 0ns
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195503 16051 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028708946667days
> ...
> Sep 07 18:49:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:49:01.253906 16049 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__014b451a-30de-41ee-b0b1-3733c790382c/runs/c5b922e8-eee0-4793-8637-7abbd7f8507e'
> ...
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291092 16048 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291285 16048 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028598086815days
> ...
> Sep 07 19:14:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: W0907 19:14:50.737226 16050 gc.cpp:174] Failed to delete 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__4139bf2e-e33b-4743-8527-f8f50ac49280/runs/b1991e28-7ff8-476f-8122-1a483e431ff2':
>  No such file or directory
> ...
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758191 16052 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758872 16047 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028057238519days
> ...
> Sep 07 19:39:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:39:43.081485 16052 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__d89dce1f-609b-4cf8-957a-5ba198be7828'
> ...
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150535 16048 slave.cpp:5920] Current disk 
> usage 94.56%. Max allowed age: 0ns
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150869 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.98959316198222days
> ...
> Sep 07 20:06:16 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 20:06:16.251552 16051 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0259/executors/data__45283e7d-9a5e-4d4b-9901-b7f1e096cd54/runs/5cfc5e3e-3975-41aa-846b-c125eb529fbe'
> {noformat}
> Each GC activity took 5+ minutes. During the period, the agent became 
> unresponsive, the health check timed out, and no endpoint responded as well. 
> When a disk-usage GC is trigged, around 300 pruning actors would be generated 
> (https://github.com/apache/mesos/blob/master/src/slave/gc.cpp#L229). My 
> hypothesis is that these actors would used all of the worker threads, and 
> some of them took 

[jira] [Updated] (MESOS-7964) Heavy-duty GC makes the agent unresponsive

2017-09-11 Thread Chun-Hung Hsiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun-Hung Hsiao updated MESOS-7964:
---
 Story Points: 2
Fix Version/s: 1.4.1

> Heavy-duty GC makes the agent unresponsive
> --
>
> Key: MESOS-7964
> URL: https://issues.apache.org/jira/browse/MESOS-7964
> Project: Mesos
>  Issue Type: Bug
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
> Fix For: 1.4.1
>
>
> An agent is observed to performe heavy-duty GC every half an hour:
> {noformat}
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900282 16054 slave.cpp:5920] Current disk 
> usage 93.61%. Max allowed age: 0ns
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900476 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99022105972148days
> ...
> Sep 07 18:22:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:22:08.173645 16050 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__f33065c9-eb42-44a7-9013-25bafc306bd5'
> ...
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195329 16051 slave.cpp:5920] Current disk 
> usage 90.85%. Max allowed age: 0ns
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195503 16051 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028708946667days
> ...
> Sep 07 18:49:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:49:01.253906 16049 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__014b451a-30de-41ee-b0b1-3733c790382c/runs/c5b922e8-eee0-4793-8637-7abbd7f8507e'
> ...
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291092 16048 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291285 16048 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028598086815days
> ...
> Sep 07 19:14:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: W0907 19:14:50.737226 16050 gc.cpp:174] Failed to delete 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__4139bf2e-e33b-4743-8527-f8f50ac49280/runs/b1991e28-7ff8-476f-8122-1a483e431ff2':
>  No such file or directory
> ...
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758191 16052 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758872 16047 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028057238519days
> ...
> Sep 07 19:39:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:39:43.081485 16052 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__d89dce1f-609b-4cf8-957a-5ba198be7828'
> ...
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150535 16048 slave.cpp:5920] Current disk 
> usage 94.56%. Max allowed age: 0ns
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150869 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.98959316198222days
> ...
> Sep 07 20:06:16 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 20:06:16.251552 16051 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0259/executors/data__45283e7d-9a5e-4d4b-9901-b7f1e096cd54/runs/5cfc5e3e-3975-41aa-846b-c125eb529fbe'
> {noformat}
> Each GC activity took 5+ minutes. During the period, the agent became 
> unresponsive, the health check timed out, and no endpoint responded as well. 
> When a disk-usage GC is trigged, around 300 pruning actors would be generated 
> (https://github.com/apache/mesos/blob/master/src/slave/gc.cpp#L229). My 
> hypothesis is that these actors would used all of the worker threads, and 
> some 

[jira] [Created] (MESOS-7965) WebUI folder not set in CMake

2017-09-11 Thread Andrew Schwartzmeyer (JIRA)
Andrew Schwartzmeyer created MESOS-7965:
---

 Summary: WebUI folder not set in CMake
 Key: MESOS-7965
 URL: https://issues.apache.org/jira/browse/MESOS-7965
 Project: Mesos
  Issue Type: Bug
  Components: cmake, webui
 Environment: Any build using CMake.
Reporter: Andrew Schwartzmeyer


The default directory for the WebUI assets is not correctly set in CMake 
builds. While a user can work around this via {{./src/mesos-master 
--webui_dir=../src/webui}}, ideally the default would "just work."

{noformat}
src/master/flags.cpp
166:  add(&Flags::webui_dir,
167:  "webui_dir",
168:  "Directory path of the webui files/assets",
169:  PKGDATADIR "/webui");
199:  "Human readable name for the cluster, displayed in the webui.");
{noformat}

We currently set {{PKGDATADIR}} to something not quite right which ends up with 
the default search path for {{webui}} assets not working.

{noformat}
cmake/CompilationConfigure.cmake
351:  -DPKGDATADIR="${DATA_INSTALL_PREFIX}")

cmake/CompilationConfigure.cmake
246:  set(DATA_INSTALL_PREFIX  ${SHARE_INSTALL_PREFIX}/mesos)
351:  -DPKGDATADIR="${DATA_INSTALL_PREFIX}")

cmake/CompilationConfigure.cmake
245:  set(SHARE_INSTALL_PREFIX ${CMAKE_INSTALL_PREFIX}/share)
246:  set(DATA_INSTALL_PREFIX  ${SHARE_INSTALL_PREFIX}/mesos)

cmake/CompilationConfigure.cmake
321:  set(EXEC_INSTALL_PREFIX "WARNINGDONOTUSEME")
322:  set(LIBEXEC_INSTALL_DIR "WARNINGDONOTUSEME")
323:  set(PKG_LIBEXEC_INSTALL_DIR "WARNINGDONOTUSEME")
324:  set(LIB_INSTALL_DIR "WARNINGDONOTUSEME")
325:  set(TEST_LIB_EXEC_DIR   "WARNINGDONOTUSEME")
326:  set(PKG_MODULE_DIR  "WARNINGDONOTUSEME")
327:  set(S_BIN_DIR   "WARNINGDONOTUSEME")
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7833) stderr/stdout logs are failing to be served to Marathon

2017-09-11 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-7833:
---

Assignee: John Kordich

> stderr/stdout logs are failing to be served to Marathon
> ---
>
> Key: MESOS-7833
> URL: https://issues.apache.org/jira/browse/MESOS-7833
> Project: Mesos
>  Issue Type: Bug
> Environment: Windows 10 mesos-agent using the Mesos Containerizer
> CentOS 7 Marathon + mesos-master + zookeeper
> Deployed following [this 
> guide|https://github.com/Microsoft/mesos-log/blob/master/notes/deployment.md].
>Reporter: Andrew Schwartzmeyer
>Assignee: John Kordich
>  Labels: microsoft, windows
>
> Given an app in Marathon with the command {{powershell -noexit -c 
> get-process}}, we expect it to deploy, and the "Error Log" and "Output Log" 
> of the running instance to return the {{stderr}} and {{stdout}} files from 
> the agent.
> While the files exist on the agent with the appropriate contents, e.g. 
> {{work_dir\slaves\ff198863-667e-46b9-a64d-e22fdff3b3cb-S4\frameworks\ff198863-667e-46b9-a64d-e22fdff3b3cb-\executors\get-process.4211c4e3-7181-11e7-b702-00155dafc802\runs\7fc924b4-4ec1-4be6-9386-d4f7cc17d5ad}}
>  has {{stderr}} and {{stdout}}, and the latter has the output of 
> {{get-process}}, Marathon is unable to retrieve them.
> Clicking the link for the instance returns the error: "Sorry there was a 
> problem retrieving file. Click to retry."
> The Mesos master is receiving the request {{I0725 14:54:49.627329 226319 
> http.cpp:1133] HTTP GET for /master/state?jsonp=jsonp_15d7bbed282 from 
> 10.123.175.200:55885 ...}}, but no further logging is displayed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7964) Heavy-duty GC makes the agent unresponsive

2017-09-11 Thread Chun-Hung Hsiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun-Hung Hsiao updated MESOS-7964:
---
Shepherd: Yan Xu

> Heavy-duty GC makes the agent unresponsive
> --
>
> Key: MESOS-7964
> URL: https://issues.apache.org/jira/browse/MESOS-7964
> Project: Mesos
>  Issue Type: Bug
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>
> An agent is observed to performe heavy-duty GC every half an hour:
> {noformat}
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900282 16054 slave.cpp:5920] Current disk 
> usage 93.61%. Max allowed age: 0ns
> Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:15:56.900476 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99022105972148days
> ...
> Sep 07 18:22:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:22:08.173645 16050 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__f33065c9-eb42-44a7-9013-25bafc306bd5'
> ...
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195329 16051 slave.cpp:5920] Current disk 
> usage 90.85%. Max allowed age: 0ns
> Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:41:08.195503 16051 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028708946667days
> ...
> Sep 07 18:49:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 18:49:01.253906 16049 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__014b451a-30de-41ee-b0b1-3733c790382c/runs/c5b922e8-eee0-4793-8637-7abbd7f8507e'
> ...
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291092 16048 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:08:01.291285 16048 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028598086815days
> ...
> Sep 07 19:14:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: W0907 19:14:50.737226 16050 gc.cpp:174] Failed to delete 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__4139bf2e-e33b-4743-8527-f8f50ac49280/runs/b1991e28-7ff8-476f-8122-1a483e431ff2':
>  No such file or directory
> ...
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758191 16052 slave.cpp:5920] Current disk 
> usage 91.39%. Max allowed age: 0ns
> Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:33:50.758872 16047 gc.cpp:218] Pruning 
> directories with remaining removal time 1.99028057238519days
> ...
> Sep 07 19:39:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:39:43.081485 16052 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__d89dce1f-609b-4cf8-957a-5ba198be7828'
> ...
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150535 16048 slave.cpp:5920] Current disk 
> usage 94.56%. Max allowed age: 0ns
> Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 19:59:43.150869 16054 gc.cpp:218] Pruning 
> directories with remaining removal time 1.98959316198222days
> ...
> Sep 07 20:06:16 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
> mesos-agent[16040]: I0907 20:06:16.251552 16051 gc.cpp:178] Deleted 
> '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0259/executors/data__45283e7d-9a5e-4d4b-9901-b7f1e096cd54/runs/5cfc5e3e-3975-41aa-846b-c125eb529fbe'
> {noformat}
> Each GC activity took 5+ minutes. During the period, the agent became 
> unresponsive, the health check timed out, and no endpoint responded as well. 
> When a disk-usage GC is trigged, around 300 pruning actors would be generated 
> (https://github.com/apache/mesos/blob/master/src/slave/gc.cpp#L229). My 
> hypothesis is that these actors would used all of the worker threads, and 
> some of them took a long time to finish (possibly due to many

[jira] [Created] (MESOS-7964) Heavy-duty GC makes the agent unresponsive

2017-09-11 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-7964:
--

 Summary: Heavy-duty GC makes the agent unresponsive
 Key: MESOS-7964
 URL: https://issues.apache.org/jira/browse/MESOS-7964
 Project: Mesos
  Issue Type: Bug
Reporter: Chun-Hung Hsiao
Assignee: Chun-Hung Hsiao


An agent is observed to performe heavy-duty GC every half an hour:
{noformat}
Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
mesos-agent[16040]: I0907 18:15:56.900282 16054 slave.cpp:5920] Current disk 
usage 93.61%. Max allowed age: 0ns
Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
mesos-agent[16040]: I0907 18:15:56.900476 16054 gc.cpp:218] Pruning directories 
with remaining removal time 1.99022105972148days
...
Sep 07 18:22:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
mesos-agent[16040]: I0907 18:22:08.173645 16050 gc.cpp:178] Deleted 
'/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__f33065c9-eb42-44a7-9013-25bafc306bd5'

...

Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
mesos-agent[16040]: I0907 18:41:08.195329 16051 slave.cpp:5920] Current disk 
usage 90.85%. Max allowed age: 0ns
Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
mesos-agent[16040]: I0907 18:41:08.195503 16051 gc.cpp:218] Pruning directories 
with remaining removal time 1.99028708946667days
...
Sep 07 18:49:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
mesos-agent[16040]: I0907 18:49:01.253906 16049 gc.cpp:178] Deleted 
'/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__014b451a-30de-41ee-b0b1-3733c790382c/runs/c5b922e8-eee0-4793-8637-7abbd7f8507e'

...

Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
mesos-agent[16040]: I0907 19:08:01.291092 16048 slave.cpp:5920] Current disk 
usage 91.39%. Max allowed age: 0ns
Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
mesos-agent[16040]: I0907 19:08:01.291285 16048 gc.cpp:218] Pruning directories 
with remaining removal time 1.99028598086815days
...
Sep 07 19:14:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
mesos-agent[16040]: W0907 19:14:50.737226 16050 gc.cpp:174] Failed to delete 
'/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__4139bf2e-e33b-4743-8527-f8f50ac49280/runs/b1991e28-7ff8-476f-8122-1a483e431ff2':
 No such file or directory

...

Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
mesos-agent[16040]: I0907 19:33:50.758191 16052 slave.cpp:5920] Current disk 
usage 91.39%. Max allowed age: 0ns
Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
mesos-agent[16040]: I0907 19:33:50.758872 16047 gc.cpp:218] Pruning directories 
with remaining removal time 1.99028057238519days
...
Sep 07 19:39:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
mesos-agent[16040]: I0907 19:39:43.081485 16052 gc.cpp:178] Deleted 
'/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__d89dce1f-609b-4cf8-957a-5ba198be7828'

...

Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
mesos-agent[16040]: I0907 19:59:43.150535 16048 slave.cpp:5920] Current disk 
usage 94.56%. Max allowed age: 0ns
Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
mesos-agent[16040]: I0907 19:59:43.150869 16054 gc.cpp:218] Pruning directories 
with remaining removal time 1.98959316198222days
...
Sep 07 20:06:16 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com 
mesos-agent[16040]: I0907 20:06:16.251552 16051 gc.cpp:178] Deleted 
'/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0259/executors/data__45283e7d-9a5e-4d4b-9901-b7f1e096cd54/runs/5cfc5e3e-3975-41aa-846b-c125eb529fbe'
{noformat}

Each GC activity took 5+ minutes. During the period, the agent became 
unresponsive, the health check timed out, and no endpoint responded as well. 
When a disk-usage GC is trigged, around 300 pruning actors would be generated 
(https://github.com/apache/mesos/blob/master/src/slave/gc.cpp#L229). My 
hypothesis is that these actors would used all of the worker threads, and some 
of them took a long time to finish (possibly due to many files to delete, or 
too many fs operations at once, etc).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7963) Task groups can lose the container limitation status.

2017-09-11 Thread James Peach (JIRA)
James Peach created MESOS-7963:
--

 Summary: Task groups can lose the container limitation status.
 Key: MESOS-7963
 URL: https://issues.apache.org/jira/browse/MESOS-7963
 Project: Mesos
  Issue Type: Bug
  Components: containerization, executor
Reporter: James Peach


If you run a single task in a task group and that task fails with a container 
limitation, that status update can be lost and only the executor failure will 
be reported to the framework.

{noformat}
exec /opt/mesos/bin/mesos-execute --content_type=json 
--master=jpeach.apple.com:5050 '--task_group={
"tasks":
[
{

"name": "7f141aca-55fe-4bb0-af4b-87f5ee26986a",
"task_id": {"value" : "2866368d-7279-4657-b8eb-bf1d968e8ebf"},
"agent_id": {"value" : ""},
"resources": [{
"name": "cpus",
"type": "SCALAR",
"scalar": {
"value": 0.2
}
}, {
"name": "mem",
"type": "SCALAR",
"scalar": {
"value": 32
}
}, {
"name": "disk",
"type": "SCALAR",
"scalar": {
"value": 2
}
}
],
"command": {
"value": "sleep 2 ; /usr/bin/dd if=/dev/zero of=out.dat bs=1M 
count=64 ; sleep 1"
}
}
]
}'
I0911 11:48:01.480689  7340 scheduler.cpp:184] Version: 1.5.0
I0911 11:48:01.488868  7339 scheduler.cpp:470] New master detected at 
master@17.228.224.108:5050
Subscribed with ID aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010
Submitted task group with tasks [ 2866368d-7279-4657-b8eb-bf1d968e8ebf ] to 
agent 'aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-S0'
Received status update TASK_RUNNING for task 
'2866368d-7279-4657-b8eb-bf1d968e8ebf'
  source: SOURCE_EXECUTOR
Received status update TASK_FAILED for task 
'2866368d-7279-4657-b8eb-bf1d968e8ebf'
  message: 'Command terminated with signal Killed'
  source: SOURCE_EXECUTOR
{noformat}

However, the agent logs show that this failed with a memory limitation:
{noformat}
I0911 11:48:02.235818  7012 http.cpp:532] Processing call WAIT_NESTED_CONTAINER
I0911 11:48:02.236395  7013 status_update_manager.cpp:323] Received status 
update TASK_RUNNING (UUID: 85e7a8e8-22a7-4561-9000-2cd6d93502d9) for task 
2866368d-7279-4657-b8eb-bf1d968e8ebf of framework 
aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010
I0911 11:48:02.237083  7016 slave.cpp:4875] Forwarding the update TASK_RUNNING 
(UUID: 85e7a8e8-22a7-4561-9000-2cd6d93502d9) for task 
2866368d-7279-4657-b8eb-bf1d968e8ebf of framework 
aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010 to master@17.228.224.108:5050
I0911 11:48:02.283661  7007 status_update_manager.cpp:395] Received status 
update acknowledgement (UUID: 85e7a8e8-22a7-4561-9000-2cd6d93502d9) for task 
2866368d-7279-4657-b8eb-bf1d968e8ebf of framework 
aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010
I0911 11:48:04.771455  7014 memory.cpp:516] OOM detected for container 
474388fe-43c3-4372-b903-eaca22740996
I0911 11:48:04.776445  7014 memory.cpp:556] Memory limit exceeded: Requested: 
64MB Maximum Used: 64MB
...
I0911 11:48:04.776943  7012 containerizer.cpp:2681] Container 
474388fe-43c3-4372-b903-eaca22740996 has reached its limit for resource 
[{"name":"mem","scalar":{"value":64.0},"type":"SCALAR"}] and will be terminated
{noformat}

The following {{mesos-execute}} task will show the container limitation 
correctly:

{noformat}
exec /opt/mesos/bin/mesos-execute --content_type=json 
--master=jpeach.apple.com:5050 '--task_group={
"tasks":
[
{

"name": "37db08f6-4f0f-4ef6-97ee-b10a5c5cc211",
"task_id": {"value" : "1372b2e2-c501-4e80-bcbd-1a5c5194e206"},
"agent_id": {"value" : ""},
"resources": [{
"name": "cpus",
"type": "SCALAR",
"scalar": {
"value": 0.2
}
},
{
"name": "mem",
"type": "SCALAR",
"scalar": {
"value": 32
}
}],
"command": {
"value": "sleep 600"
}
}, {

"name": "7247643c-5e4d-4b01-9839-e38db49f7f4d",
"task_id": {"value" : "a7571608-3a53-4971-a187-41ed8be183ba"},
"agent_id": {"value" : ""},
"resources": [{
"name": "cpus",
"type": "SCALAR",
"scalar": {
"value": 0.2
}
}, {
"name": "mem",
"type": "SCALAR",
"scalar": {
"value": 32
}
}, {
"name": "disk",
"type": "SCALAR",

[jira] [Created] (MESOS-7962) Display task state counters in the framework page of the webui.

2017-09-11 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-7962:
--

 Summary: Display task state counters in the framework page of the 
webui.
 Key: MESOS-7962
 URL: https://issues.apache.org/jira/browse/MESOS-7962
 Project: Mesos
  Issue Type: Improvement
  Components: webui
Reporter: Benjamin Mahler


Currently the webui displays task state counters across all frameworks on the 
home page, but it does not display the per-framework task state counters when 
you click in to a particular framework. We should add the task state counters 
to the per-framework page.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-1976) Sandbox browse UI has path which is not selectable

2017-09-11 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-1976:
--

Resolution: Fixed
  Assignee: haosdent

Duplicated by and fixed in MESOS-7468.

> Sandbox browse UI has path which is not selectable
> --
>
> Key: MESOS-1976
> URL: https://issues.apache.org/jira/browse/MESOS-1976
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 0.20.1
>Reporter: Steven Schlansker
>Assignee: haosdent
>Priority: Minor
>
> The Sandbox UI displays the path being browsed as a series of links.  It is 
> not possible to copy the path from this, it ends up being formatted as e.g. 
> {code}
>  mnt
> mesos
> slaves
> 20141022-230146-2500085258-5050-1554-3
> frameworks
> Singularity
> executors
> ci-discovery-singularity-bridge-steven.2014.10.21T21.00.04-1414092693380-2-10-us_west_2a
> runs
> 554eebb3-126d-42bd-95c2-aa8282b05522 
> {code}
> instead of the expected
> {code}
> /mnt/mesos/slaves/20141022-230146-2500085258-5050-1554-3/frameworks/Singularity/executors/ci-discovery-singularity-bridge-steven.2014.10.21T21.00.04-1414092693380-2-10-us_west_2a/runs/554eebb3-126d-42bd-95c2-aa8282b05522
>  
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-4341) Break up used / allocated / available resources

2017-09-11 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161778#comment-16161778
 ] 

Benjamin Mahler commented on MESOS-4341:


This ticket combines too many things:

{quote}
Used / allocated naming is very inconsistent now. Sometimes allocated means 
allocated, sometimes "total available": http://imgur.com/zLEX5pU
{quote}

This was a bug, fixed here: 
https://github.com/apache/mesos/commit/65ecc0a7c87880f382c13062d4d21c0cd178c945

{quote}
I propose to have the following:
used / allocated for each framework and task
user / allocated / total for each slave and for cluster as a whole
Master now shows idle and offered, slaves have "slack" capacity too that might 
be used for preemptive tasks. Not sure if it should be visualized as well.
It looks like fetching cluster-wide utilization requires fetching stats from 
each slave, so it can be added separately later.
{quote}

What I'm distilling from this is a request for showing utilization from the 
master pages, since we already display it per agent. We haven't done this 
since, as you said, the master does not have utilization information of the 
whole cluster at the current time. It's possible we could build something for 
this. I'll close this out, can you file a ticket with the specific request for 
displaying utilization from the cluster-wide pages?

> Break up used / allocated / available resources
> ---
>
> Key: MESOS-4341
> URL: https://issues.apache.org/jira/browse/MESOS-4341
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Ivan Babrou
> Attachments: really.png, Screen Shot 2016-01-12 at 8.51.39 PM.png
>
>
> Used / allocated naming is very inconsistent now. Sometimes allocated means 
> allocated, sometimes "total available": http://imgur.com/zLEX5pU
> I propose to have the following:
> * used / allocated for each framework and task
> * user / allocated / total for each slave and for cluster as a whole
> Master now shows idle and offered, slaves have "slack" capacity too that 
> might be used for preemptive tasks. Not sure if it should be visualized as 
> well.
> It looks like fetching cluster-wide utilization requires fetching stats from 
> each slave, so it can be added separately later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7961) Display task health in the webui.

2017-09-11 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-7961:
--

 Summary: Display task health in the webui.
 Key: MESOS-7961
 URL: https://issues.apache.org/jira/browse/MESOS-7961
 Project: Mesos
  Issue Type: Improvement
  Components: webui
Reporter: Benjamin Mahler


Currently the webui does not display task health based on the latest status 
update. Since this information is in the protobuf, it is within the webui's 
scope to display health information.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7867) Master doesn't handle scheduler driver downgrade from HTTP based to PID based

2017-09-11 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7867:
--
Shepherd: Anand Mazumdar

> Master doesn't handle scheduler driver downgrade from HTTP based to PID based
> -
>
> Key: MESOS-7867
> URL: https://issues.apache.org/jira/browse/MESOS-7867
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.3.0
>Reporter: Ilya Pronin
>Assignee: Ilya Pronin
>
> When a framework upgrades from a PID based driver to an HTTP based driver, 
> master removes its per-framework-principal metrics ({{messages_received}} and 
> {{messages_processed}}) in {{Master::failoverFramework}}. When the same 
> framework downgrades back to a PID based driver, the master doesn't reinstate 
> those metrics. This causes a crash when the master receives a message from 
> the failed over framework and increments {{messages_received}} counter in 
> {{Master::visit(const MessageEvent&)}}.
> {noformat}
> I0807 18:17:45.713220 19095 master.cpp:2916] Framework 
> 70822e80-ca38-4470-916e-e6da073a4742- (TwitterScheduler) failed over
> F0807 18:18:20.725908 19079 master.cpp:1451] Check failed: 
> metrics->frameworks.contains(principal.get())
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7960) Deprecate non virtual path browse/read for sandbox

2017-09-11 Thread Zhitao Li (JIRA)
Zhitao Li created MESOS-7960:


 Summary: Deprecate non virtual path browse/read for sandbox
 Key: MESOS-7960
 URL: https://issues.apache.org/jira/browse/MESOS-7960
 Project: Mesos
  Issue Type: Improvement
Reporter: Zhitao Li
Priority: Minor


We added support to browse and read files in executor's latest sandbox run 
directory in Mesos-7899. We should remove support for physical path after Mesos 
2.0 because it requires the {{work_dir}} and {{agent_id}}, which are not 
necessary to expose to frameworks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7634) OsTest.ChownNoAccess fails on s390x machines

2017-09-11 Thread Prajakta Bhatekar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161209#comment-16161209
 ] 

Prajakta Bhatekar commented on MESOS-7634:
--

Hi Vinod,

Looks like the failure in OsTest.ChownNoAccess is due to wrong status set by 
FTS functions of gnulib.They are provided by gnutils(coreutils on Ubuntu) 
package and are resposible for traversing the directory structure 
For s390x, the status flag is set to FTS_DP(i.e PostOrder Directory) instead of 
expected FTS_DNR(Unreadable Directory) for a directory with 0 permissions. 

At line 47, in file  3rdparty/stout/include/stout/os/posix/chown.hpp 
,"fts_info" status flag of FTSENT structure returned by fts_read() is set to 
6(FTS_DP)-PostOrder directory . The expected value is 4(FTS_DNR) - Unreadable 
directory. Thus default case gets executed and an error is not returned causing 
assertion to fail.

Investigating on why the status flag is set differently on x86 and s390x.

> OsTest.ChownNoAccess fails on s390x machines
> 
>
> Key: MESOS-7634
> URL: https://issues.apache.org/jira/browse/MESOS-7634
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Nayana Thorat
>
> Running a custom branch of Mesos (with some fixes in docker build scripts for 
> s390x) on s390x based CI machines throws the following error when running 
> stout tests.
> {code}
> [ RUN  ] OsTest.ChownNoAccess
> ../../../../3rdparty/stout/tests/os_tests.cpp:839: Failure
> Value of: os::chown(uid.get(), gid.get(), "one", true).isError()
>   Actual: false
> Expected: true
> ../../../../3rdparty/stout/tests/os_tests.cpp:840: Failure
> Value of: os::chown(uid.get(), gid.get(), "one/two", true).isError()
>   Actual: false
> {code}
> One can repro this by building Mesos from my custom branch here: 
> https://github.com/vinodkone/mesos/tree/vinod/s390x



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7959) Missing documentation for registry garbage collection flags

2017-09-11 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-7959:

Description: 
* The master registry garbage collection-related master flags 
({{registry_gc_interval}}, {{registry_max_agent_age}}, 
{{registry_max_agent_count}}) appear to be missing from the documentation in 
e.g., {{configuration.md}}.

We should add documentation for them, and audit the existing documentation for 
additional missing flags.

  was:
The master registry garbage collection-related master flags (appear to be 
missing from the documentation in e.g., {{configuration.md}}.

We should add documentation for them, and audit the existing documentation for 
additional missing flags.


> Missing documentation for registry garbage collection flags
> ---
>
> Key: MESOS-7959
> URL: https://issues.apache.org/jira/browse/MESOS-7959
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, master
>Reporter: Benjamin Bannier
>  Labels: mesosphere, newbie
>
> * The master registry garbage collection-related master flags 
> ({{registry_gc_interval}}, {{registry_max_agent_age}}, 
> {{registry_max_agent_count}}) appear to be missing from the documentation in 
> e.g., {{configuration.md}}.
> We should add documentation for them, and audit the existing documentation 
> for additional missing flags.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7959) Missing documentation for registry garbage collection flags

2017-09-11 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-7959:

Description: 
The master registry garbage collection-related master flags (appear to be 
missing from the documentation in e.g., {{configuration.md}}.

We should add documentation for them, and audit the existing documentation for 
additional missing flags.

  was:
The master registry garbage collection-related master flags appear to be 
missing from the documentation in e.g., {{configuration.md}}.

We should add documentation for them, and audit the existing documentation for 
additional missing flags.


> Missing documentation for registry garbage collection flags
> ---
>
> Key: MESOS-7959
> URL: https://issues.apache.org/jira/browse/MESOS-7959
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, master
>Reporter: Benjamin Bannier
>  Labels: mesosphere, newbie
>
> The master registry garbage collection-related master flags (appear to be 
> missing from the documentation in e.g., {{configuration.md}}.
> We should add documentation for them, and audit the existing documentation 
> for additional missing flags.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7959) Missing documentation for registry garbage collection flags

2017-09-11 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-7959:
---

 Summary: Missing documentation for registry garbage collection 
flags
 Key: MESOS-7959
 URL: https://issues.apache.org/jira/browse/MESOS-7959
 Project: Mesos
  Issue Type: Documentation
  Components: documentation, master
Reporter: Benjamin Bannier


The master registry garbage collection-related master flags appear to be 
missing from the documentation in e.g., {{configuration.md}}.

We should add documentation for them, and audit the existing documentation for 
additional missing flags.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7924) Add a javascript linter to the webui.

2017-09-11 Thread Armand Grillet (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161003#comment-16161003
 ] 

Armand Grillet commented on MESOS-7924:
---

I have worked on the review request linked above over the weekend. This is 
currently a WIP that I have published mainly to get your review concerning the 
file {{.eslintrc.js}}, this file defines which errors/warnings should the 
linter display thus a correctly opinionated review of it is important. I have 
created the one in the patch by running {{eslint --init}} and answering these 
questions without being too careful:

{code}
(web-ui) static (eslint) $ eslint --init
? How would you like to configure ESLint? Inspect your JavaScript file(s)
? Which file(s), path(s), or glob(s) should be examined? 
/Users/Armand/Code/apache-mesos/src/webui/master/static/js/app.js
? What format do you want your config file to be in? JavaScript
? Are you using ECMAScript 6 features? No
? Where will your code run? Browser
? Do you use CommonJS? No
? Do you use JSX? No
{code}

The rest of the patch is similar to what we do for our Python linter. This 
means adding a {{.gitignore}} to the repository which is not conventional but 
already done in {{src/python/cli_new}}, adding {{.virtualenv}} to the root 
{{.gitignore}} is also a possibility.

> Add a javascript linter to the webui.
> -
>
> Key: MESOS-7924
> URL: https://issues.apache.org/jira/browse/MESOS-7924
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Benjamin Mahler
>Assignee: Armand Grillet
>  Labels: tech-debt
>
> As far as I can tell, javascript linters (e.g. ESLint) help catch some 
> functional errors as well, for example, we've made some "strict" mistakes a 
> few times that ESLint can catch: MESOS-6624, MESOS-7912.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-7944) Implement jemalloc support for Mesos

2017-09-11 Thread Benno Evers (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161001#comment-16161001
 ] 

Benno Evers edited comment on MESOS-7944 at 9/11/17 9:59 AM:
-

Since I've started to work on this, I have now a much better idea of what needs 
to be done.

First of all, since the added features are not mesos-specific, I think it's 
best to add them directly to libprocess. However, the choice of preferred 
malloc should be up the binary, not enforced by a shared library, so instead 
compiling against jemalloc we should detect at runtime whether we're running 
under jemalloc or not. (similar to what folly does here: 
https://github.com/facebook/folly/blob/master/folly/Malloc.h#L150)

At the endpoint, the minimum features I would like are the ability to get the 
(exact) heap allocation statistics as JSON, or download current (stochastic) 
heap profile dumps as files. Depending on the complexity of it, we should also 
think about providing a way to have the master dump profiles periodically and 
store them on disk, and a way to generate jeprof-graphs automatically.

Finally, the new `--enable-memory-profiling` configure option (tentative name) 
for mesos would build a bundled version of jemalloc with all the necessary 
configuration options enabled, and link the mesos-master and mesos-slave 
binaries against this library.


was (Author: bennoe):
Since I've started to work on this, I have now a much sharper idea of what 
needs to be done.

First of all, since the added features are not mesos-specific, I think it's 
best to add them directly to libprocess. However, the choice of preferred 
malloc should be up the binary, not enforced by a shared library, so instead 
compiling against jemalloc we should detect at runtime whether we're running 
under jemalloc or not. (similar to what folly does here: 
https://github.com/facebook/folly/blob/master/folly/Malloc.h#L150)

At the endpoint, the minimum features I would like are the ability to get the 
(exact) heap allocation statistics as JSON, or download current (stochastic) 
heap profile dumps as files. Depending on the complexity of it, we should also 
think about providing a way to have the master dump profiles periodically and 
store them on disk, and a way to generate jeprof-graphs automatically.

Finally, the new `--enable-memory-profiling` configure option (tentative name) 
for mesos would build a bundled version of jemalloc with all the necessary 
configuration options enabled, and link the mesos-master and mesos-slave 
binaries against this library.

> Implement jemalloc support for Mesos
> 
>
> Key: MESOS-7944
> URL: https://issues.apache.org/jira/browse/MESOS-7944
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benno Evers
>Assignee: Benno Evers
>
> After investigation in MESOS-7876 and discussion on the mailing list, this 
> task is for tracking progress on adding out-of-the-box memory profiling 
> support using jemalloc to Mesos.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7944) Implement jemalloc support for Mesos

2017-09-11 Thread Benno Evers (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161001#comment-16161001
 ] 

Benno Evers commented on MESOS-7944:


Since I've started to work on this, I have now a much sharper idea of what 
needs to be done.

First of all, since the added features are not mesos-specific, I think it's 
best to add them directly to libprocess. However, the choice of preferred 
malloc should be up the binary, not enforced by a shared library, so instead 
compiling against jemalloc we should detect at runtime whether we're running 
under jemalloc or not. (similar to what folly does here: 
https://github.com/facebook/folly/blob/master/folly/Malloc.h#L150)

At the endpoint, the minimum features I would like are the ability to get the 
(exact) heap allocation statistics as JSON, or download current (stochastic) 
heap profile dumps as files. Depending on the complexity of it, we should also 
think about providing a way to have the master dump profiles periodically and 
store them on disk, and a way to generate jeprof-graphs automatically.

Finally, the new `--enable-memory-profiling` configure option (tentative name) 
for mesos would build a bundled version of jemalloc with all the necessary 
configuration options enabled, and link the mesos-master and mesos-slave 
binaries against this library.

> Implement jemalloc support for Mesos
> 
>
> Key: MESOS-7944
> URL: https://issues.apache.org/jira/browse/MESOS-7944
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benno Evers
>Assignee: Benno Evers
>
> After investigation in MESOS-7876 and discussion on the mailing list, this 
> task is for tracking progress on adding out-of-the-box memory profiling 
> support using jemalloc to Mesos.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-7924) Add a javascript linter to the webui.

2017-09-11 Thread Armand Grillet (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armand Grillet reassigned MESOS-7924:
-

Assignee: Armand Grillet

> Add a javascript linter to the webui.
> -
>
> Key: MESOS-7924
> URL: https://issues.apache.org/jira/browse/MESOS-7924
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Benjamin Mahler
>Assignee: Armand Grillet
>  Labels: tech-debt
>
> As far as I can tell, javascript linters (e.g. ESLint) help catch some 
> functional errors as well, for example, we've made some "strict" mistakes a 
> few times that ESLint can catch: MESOS-6624, MESOS-7912.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7958) The example framework `test-framework` is broken.

2017-09-11 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160835#comment-16160835
 ] 

Michael Park commented on MESOS-7958:
-

The example frameworks should work without having to be installed. The fact 
that it works with {{LD_LIBRARY_PATH}} set I think is a good thing to know, but 
it used to, and I think it should work without having to do that.

> The example framework `test-framework` is broken.
> -
>
> Key: MESOS-7958
> URL: https://issues.apache.org/jira/browse/MESOS-7958
> Project: Mesos
>  Issue Type: Bug
>  Components: framework
>Reporter: Michael Park
> Attachments: screenshot-1.png
>
>
> The {{test-framework}} example framework does not work.
> Launching a cluster like so:
> {code}
> MESOS_RESOURCES="cpus:32;mem:512;disk:1024" MESOS_REGISTRY="in_memory" 
> ./bin/mesos-local.sh --num_slaves=1 --ip=127.0.0.1 --port=4040 
> --work_dir=$HOME/mesos-local
> {code}
> and trying to launch the {{test-framework}} like so:
> {code}
> ./src/test-framework --master=127.0.0.1:4040
> {code}
> {code}
> /home/mpark/projects/mesos/build/src/.libs/test-executor: error while loading 
> shared libraries: libmesos-1.5.0.so: cannot open shared object file: No such 
> file or directory
> {code}
> It seems that {{test-executor}} cannot load {{libmesos.so}} correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7958) The example framework `test-framework` is broken.

2017-09-11 Thread Christopher Ogle (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160831#comment-16160831
 ] 

Christopher Ogle commented on MESOS-7958:
-

Yep, that worked. [~mcypark] would you maybe want to me to explore setting up a 
Troubleshooting section that tracks and attempts answer issues like these.

> The example framework `test-framework` is broken.
> -
>
> Key: MESOS-7958
> URL: https://issues.apache.org/jira/browse/MESOS-7958
> Project: Mesos
>  Issue Type: Bug
>  Components: framework
>Reporter: Michael Park
> Attachments: screenshot-1.png
>
>
> The {{test-framework}} example framework does not work.
> Launching a cluster like so:
> {code}
> MESOS_RESOURCES="cpus:32;mem:512;disk:1024" MESOS_REGISTRY="in_memory" 
> ./bin/mesos-local.sh --num_slaves=1 --ip=127.0.0.1 --port=4040 
> --work_dir=$HOME/mesos-local
> {code}
> and trying to launch the {{test-framework}} like so:
> {code}
> ./src/test-framework --master=127.0.0.1:4040
> {code}
> {code}
> /home/mpark/projects/mesos/build/src/.libs/test-executor: error while loading 
> shared libraries: libmesos-1.5.0.so: cannot open shared object file: No such 
> file or directory
> {code}
> It seems that {{test-executor}} cannot load {{libmesos.so}} correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-7958) The example framework `test-framework` is broken.

2017-09-11 Thread Christopher Ogle (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160831#comment-16160831
 ] 

Christopher Ogle edited comment on MESOS-7958 at 9/11/17 7:23 AM:
--

Yep, that worked. [~mcypark] would you maybe want to me to explore setting up a 
Troubleshooting section that tracks and attempts answer issues like these?


was (Author: cogle):
Yep, that worked. [~mcypark] would you maybe want to me to explore setting up a 
Troubleshooting section that tracks and attempts answer issues like these.

> The example framework `test-framework` is broken.
> -
>
> Key: MESOS-7958
> URL: https://issues.apache.org/jira/browse/MESOS-7958
> Project: Mesos
>  Issue Type: Bug
>  Components: framework
>Reporter: Michael Park
> Attachments: screenshot-1.png
>
>
> The {{test-framework}} example framework does not work.
> Launching a cluster like so:
> {code}
> MESOS_RESOURCES="cpus:32;mem:512;disk:1024" MESOS_REGISTRY="in_memory" 
> ./bin/mesos-local.sh --num_slaves=1 --ip=127.0.0.1 --port=4040 
> --work_dir=$HOME/mesos-local
> {code}
> and trying to launch the {{test-framework}} like so:
> {code}
> ./src/test-framework --master=127.0.0.1:4040
> {code}
> {code}
> /home/mpark/projects/mesos/build/src/.libs/test-executor: error while loading 
> shared libraries: libmesos-1.5.0.so: cannot open shared object file: No such 
> file or directory
> {code}
> It seems that {{test-executor}} cannot load {{libmesos.so}} correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-7958) The example framework `test-framework` is broken.

2017-09-11 Thread Christopher Ogle (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160831#comment-16160831
 ] 

Christopher Ogle edited comment on MESOS-7958 at 9/11/17 7:23 AM:
--

Yep, that worked, thanks. [~mcypark] would you maybe want to me to explore 
setting up a Troubleshooting section that tracks and attempts answer issues 
like these?


was (Author: cogle):
Yep, that worked. [~mcypark] would you maybe want to me to explore setting up a 
Troubleshooting section that tracks and attempts answer issues like these?

> The example framework `test-framework` is broken.
> -
>
> Key: MESOS-7958
> URL: https://issues.apache.org/jira/browse/MESOS-7958
> Project: Mesos
>  Issue Type: Bug
>  Components: framework
>Reporter: Michael Park
> Attachments: screenshot-1.png
>
>
> The {{test-framework}} example framework does not work.
> Launching a cluster like so:
> {code}
> MESOS_RESOURCES="cpus:32;mem:512;disk:1024" MESOS_REGISTRY="in_memory" 
> ./bin/mesos-local.sh --num_slaves=1 --ip=127.0.0.1 --port=4040 
> --work_dir=$HOME/mesos-local
> {code}
> and trying to launch the {{test-framework}} like so:
> {code}
> ./src/test-framework --master=127.0.0.1:4040
> {code}
> {code}
> /home/mpark/projects/mesos/build/src/.libs/test-executor: error while loading 
> shared libraries: libmesos-1.5.0.so: cannot open shared object file: No such 
> file or directory
> {code}
> It seems that {{test-executor}} cannot load {{libmesos.so}} correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)