[jira] [Updated] (MESOS-7957) The REGISTER_FRAMEWORK_WITH_ROLE does not use in source code
[ https://issues.apache.org/jira/browse/MESOS-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jackyoh updated MESOS-7957: --- Shepherd: Adam B > The REGISTER_FRAMEWORK_WITH_ROLE does not use in source code > > > Key: MESOS-7957 > URL: https://issues.apache.org/jira/browse/MESOS-7957 > Project: Mesos > Issue Type: Improvement > Components: security >Reporter: jackyoh >Priority: Trivial > > Mesos test code has the REGISTER_FRAMEWORK_WITH_ROLE action in > src/test/authorization_tests.cpp, but the source code does not > use the REGISTER_FRAMEWORK_WITH_ROLE action. > Can I remove the REGISTER_FRAMEWORK_WITH_ROLE action in > src/tests/authorization_tests.cpp? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Issue Comment Deleted] (MESOS-7964) Heavy-duty GC makes the agent unresponsive
[ https://issues.apache.org/jira/browse/MESOS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun-Hung Hsiao updated MESOS-7964: --- Comment: was deleted (was: Patch for review: https://reviews.apache.org/r/62230/) > Heavy-duty GC makes the agent unresponsive > -- > > Key: MESOS-7964 > URL: https://issues.apache.org/jira/browse/MESOS-7964 > Project: Mesos > Issue Type: Bug > Components: agent >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao > Fix For: 1.4.1 > > > An agent is observed to performe heavy-duty GC every half an hour: > {noformat} > Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:15:56.900282 16054 slave.cpp:5920] Current disk > usage 93.61%. Max allowed age: 0ns > Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:15:56.900476 16054 gc.cpp:218] Pruning > directories with remaining removal time 1.99022105972148days > ... > Sep 07 18:22:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:22:08.173645 16050 gc.cpp:178] Deleted > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__f33065c9-eb42-44a7-9013-25bafc306bd5' > ... > Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:41:08.195329 16051 slave.cpp:5920] Current disk > usage 90.85%. Max allowed age: 0ns > Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:41:08.195503 16051 gc.cpp:218] Pruning > directories with remaining removal time 1.99028708946667days > ... > Sep 07 18:49:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:49:01.253906 16049 gc.cpp:178] Deleted > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__014b451a-30de-41ee-b0b1-3733c790382c/runs/c5b922e8-eee0-4793-8637-7abbd7f8507e' > ... > Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:08:01.291092 16048 slave.cpp:5920] Current disk > usage 91.39%. Max allowed age: 0ns > Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:08:01.291285 16048 gc.cpp:218] Pruning > directories with remaining removal time 1.99028598086815days > ... > Sep 07 19:14:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: W0907 19:14:50.737226 16050 gc.cpp:174] Failed to delete > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__4139bf2e-e33b-4743-8527-f8f50ac49280/runs/b1991e28-7ff8-476f-8122-1a483e431ff2': > No such file or directory > ... > Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:33:50.758191 16052 slave.cpp:5920] Current disk > usage 91.39%. Max allowed age: 0ns > Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:33:50.758872 16047 gc.cpp:218] Pruning > directories with remaining removal time 1.99028057238519days > ... > Sep 07 19:39:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:39:43.081485 16052 gc.cpp:178] Deleted > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__d89dce1f-609b-4cf8-957a-5ba198be7828' > ... > Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:59:43.150535 16048 slave.cpp:5920] Current disk > usage 94.56%. Max allowed age: 0ns > Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:59:43.150869 16054 gc.cpp:218] Pruning > directories with remaining removal time 1.98959316198222days > ... > Sep 07 20:06:16 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 20:06:16.251552 16051 gc.cpp:178] Deleted > '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0259/executors/data__45283e7d-9a5e-4d4b-9901-b7f1e096cd54/runs/5cfc5e3e-3975-41aa-846b-c125eb529fbe' > {noformat} > Each GC activity took 5+ minutes. During the period, the agent became > unresponsive, the health check timed out, and no endpoint responded as well. > When a disk-usage GC is trigged, around 300 pruning actors would be generated > (https://github.com/apache/mesos/blob/master/src/slave/gc.cpp#L229). My > hypothesis i
[jira] [Comment Edited] (MESOS-6428) Mesos containerizer helper function signalSafeWriteStatus is not AS-Safe
[ https://issues.apache.org/jira/browse/MESOS-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162355#comment-16162355 ] Till Toenshoff edited comment on MESOS-6428 at 9/12/17 1:51 AM: [~jamespeach] we commonly remove the duplication from the RR subject / description from our commit messages before pushing. So instead of {noformat} commit 905c758782f8587276ee207261277517a34482a2 Author: Andrei Budnik Date: Wed Sep 6 22:02:29 2017 -0700 Used SAFE_EXIT macro in `CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write`. Used SAFE_EXIT macro in `CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write`. Review: https://reviews.apache.org/r/61801/ {noformat} it would be {noformat} commit 905c758782f8587276ee207261277517a34482a2 Author: Andrei Budnik Date: Wed Sep 6 22:02:29 2017 -0700 Used SAFE_EXIT macro in `CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write`. Review: https://reviews.apache.org/r/61801/ {noformat} was (Author: tillt): [~jamespeach] we commonly remove the duplication from the RR subject / description from our commit messages before pushing. So instead of ``` commit 905c758782f8587276ee207261277517a34482a2 Author: Andrei Budnik Date: Wed Sep 6 22:02:29 2017 -0700 Used SAFE_EXIT macro in `CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write`. Used SAFE_EXIT macro in `CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write`. Review: https://reviews.apache.org/r/61801/ ``` it would be ``` commit 905c758782f8587276ee207261277517a34482a2 Author: Andrei Budnik Date: Wed Sep 6 22:02:29 2017 -0700 Used SAFE_EXIT macro in `CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write`. Review: https://reviews.apache.org/r/61801/ ``` > Mesos containerizer helper function signalSafeWriteStatus is not AS-Safe > > > Key: MESOS-6428 > URL: https://issues.apache.org/jira/browse/MESOS-6428 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.1.0 >Reporter: Benjamin Bannier >Assignee: Andrei Budnik > Labels: newbie, tech-debt > Fix For: 1.5.0 > > > In {{src/slave/containerizer/mesos/launch.cpp}} a helper function > {{signalSafeWriteStatus}} is defined. Its name seems to suggest that this > function is safe to call in e.g., signal handlers, and it is used in this > file's {{signalHandler}} for exactly that purpose. > Currently this function is not AS-Safe since it e.g., allocates memory via > construction of {{string}} instances, and might destructively modify > {{errno}}. > We should clean up this function to be in fact AS-Safe. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-6428) Mesos containerizer helper function signalSafeWriteStatus is not AS-Safe
[ https://issues.apache.org/jira/browse/MESOS-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162355#comment-16162355 ] Till Toenshoff commented on MESOS-6428: --- [~jamespeach] we commonly remove the duplication from the RR subject / description from our commit messages before pushing. So instead of ``` commit 905c758782f8587276ee207261277517a34482a2 Author: Andrei Budnik Date: Wed Sep 6 22:02:29 2017 -0700 Used SAFE_EXIT macro in `CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write`. Used SAFE_EXIT macro in `CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write`. Review: https://reviews.apache.org/r/61801/ ``` it would be ``` commit 905c758782f8587276ee207261277517a34482a2 Author: Andrei Budnik Date: Wed Sep 6 22:02:29 2017 -0700 Used SAFE_EXIT macro in `CgroupsAnyHierarchyTest.ROOT_CGROUPS_Write`. Review: https://reviews.apache.org/r/61801/ ``` > Mesos containerizer helper function signalSafeWriteStatus is not AS-Safe > > > Key: MESOS-6428 > URL: https://issues.apache.org/jira/browse/MESOS-6428 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.1.0 >Reporter: Benjamin Bannier >Assignee: Andrei Budnik > Labels: newbie, tech-debt > Fix For: 1.5.0 > > > In {{src/slave/containerizer/mesos/launch.cpp}} a helper function > {{signalSafeWriteStatus}} is defined. Its name seems to suggest that this > function is safe to call in e.g., signal handlers, and it is used in this > file's {{signalHandler}} for exactly that purpose. > Currently this function is not AS-Safe since it e.g., allocates memory via > construction of {{string}} instances, and might destructively modify > {{errno}}. > We should clean up this function to be in fact AS-Safe. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-2728) Introduce concept of cluster wide resources.
[ https://issues.apache.org/jira/browse/MESOS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162234#comment-16162234 ] Huadong Liu commented on MESOS-2728: [~jieyu], [~vinodkone] Is there an update on this? > Introduce concept of cluster wide resources. > > > Key: MESOS-2728 > URL: https://issues.apache.org/jira/browse/MESOS-2728 > Project: Mesos > Issue Type: Epic >Reporter: Joerg Schad >Assignee: Joerg Schad > Labels: external-volumes, mesosphere > > There are resources which are not provided by a single node. Consider for > example a external Network Bandwidth of a cluster. Being a limited resource > it makes sense for Mesos to manage it but still it is not a resource being > offered by a single node. A cluster-wide resource is still consumed by a > task, and when that task completes, the resources are then available to be > allocated to another framework/task. > Use Cases: > 1. Network Bandwidth > 2. IP Addresses > 3. Global Service Ports > 4. Distributed File System Storage > 5. Software Licences > 6. SAN Volumes -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-5482) mesos/marathon task stuck in staging after slave reboot
[ https://issues.apache.org/jira/browse/MESOS-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162232#comment-16162232 ] Mao Geng commented on MESOS-5482: - [~chhsia0] the problem happened on agent lost connection with master and re-registered, no one was really shutting down marathon. MESOS-7215 look like the root cause. When agent re-registered, it was shutting down all executors of non partition-aware frameworks, including the marathon task. Meanwhile marathon tried to lunch a new task on the agent, and the agent ignored running the task as it thought the framework was shutting down, hence the task got stuck in the "staging" stage. Then marathon tried to kill the task as the task is overdue on deployment, which got ignored by the agent too. Restarting the agent resolves this issue though. > mesos/marathon task stuck in staging after slave reboot > --- > > Key: MESOS-5482 > URL: https://issues.apache.org/jira/browse/MESOS-5482 > Project: Mesos > Issue Type: Bug >Reporter: lutful karim > Labels: tech-debt > Attachments: marathon-mesos-masters_after-reboot.log, > mesos-masters_mesos.log, mesos_slaves_after_reboot.log, > tasks_running_before_rebooot.marathon > > > The main idea of mesos/marathon is to sleep well, but after node reboot mesos > task gets stuck in staging for about 4 hours. > To reproduce the issue: > - setup a mesos cluster in HA mode with systemd enabled mesos-master and > mesos-slave service. > - run docker registry (https://hub.docker.com/_/registry/ ) with mesos > constraint (hostname:LIKE:mesos-slave-1) in one node. Reboot the node and > notice that task getting stuck in staging. > Possible workaround: service mesos-slave restart fixes the issue. > OS: centos 7.2 > mesos version: 0.28.1 > marathon: 1.1.1 > zookeeper: 3.4.8 > docker: 1.9.1 dockerAPIversion: 1.21 > error message: > May 30 08:38:24 euca-10-254-237-140 mesos-slave[832]: W0530 08:38:24.120013 > 909 slave.cpp:2018] Ignoring kill task > docker-registry.066fb448-2628-11e6-bedd-d00d0ef81dc3 because the executor > 'docker-registry.066fb448-2628-11e6-bedd-d00d0ef81dc3' of framework > 8517fcb7-f2d0-47ad-ae02-837570bef929- is terminating/terminated -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7964) Heavy-duty GC makes the agent unresponsive
[ https://issues.apache.org/jira/browse/MESOS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun-Hung Hsiao updated MESOS-7964: --- Component/s: agent > Heavy-duty GC makes the agent unresponsive > -- > > Key: MESOS-7964 > URL: https://issues.apache.org/jira/browse/MESOS-7964 > Project: Mesos > Issue Type: Bug > Components: agent >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao > Fix For: 1.4.1 > > > An agent is observed to performe heavy-duty GC every half an hour: > {noformat} > Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:15:56.900282 16054 slave.cpp:5920] Current disk > usage 93.61%. Max allowed age: 0ns > Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:15:56.900476 16054 gc.cpp:218] Pruning > directories with remaining removal time 1.99022105972148days > ... > Sep 07 18:22:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:22:08.173645 16050 gc.cpp:178] Deleted > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__f33065c9-eb42-44a7-9013-25bafc306bd5' > ... > Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:41:08.195329 16051 slave.cpp:5920] Current disk > usage 90.85%. Max allowed age: 0ns > Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:41:08.195503 16051 gc.cpp:218] Pruning > directories with remaining removal time 1.99028708946667days > ... > Sep 07 18:49:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:49:01.253906 16049 gc.cpp:178] Deleted > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__014b451a-30de-41ee-b0b1-3733c790382c/runs/c5b922e8-eee0-4793-8637-7abbd7f8507e' > ... > Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:08:01.291092 16048 slave.cpp:5920] Current disk > usage 91.39%. Max allowed age: 0ns > Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:08:01.291285 16048 gc.cpp:218] Pruning > directories with remaining removal time 1.99028598086815days > ... > Sep 07 19:14:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: W0907 19:14:50.737226 16050 gc.cpp:174] Failed to delete > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__4139bf2e-e33b-4743-8527-f8f50ac49280/runs/b1991e28-7ff8-476f-8122-1a483e431ff2': > No such file or directory > ... > Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:33:50.758191 16052 slave.cpp:5920] Current disk > usage 91.39%. Max allowed age: 0ns > Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:33:50.758872 16047 gc.cpp:218] Pruning > directories with remaining removal time 1.99028057238519days > ... > Sep 07 19:39:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:39:43.081485 16052 gc.cpp:178] Deleted > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__d89dce1f-609b-4cf8-957a-5ba198be7828' > ... > Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:59:43.150535 16048 slave.cpp:5920] Current disk > usage 94.56%. Max allowed age: 0ns > Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:59:43.150869 16054 gc.cpp:218] Pruning > directories with remaining removal time 1.98959316198222days > ... > Sep 07 20:06:16 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 20:06:16.251552 16051 gc.cpp:178] Deleted > '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0259/executors/data__45283e7d-9a5e-4d4b-9901-b7f1e096cd54/runs/5cfc5e3e-3975-41aa-846b-c125eb529fbe' > {noformat} > Each GC activity took 5+ minutes. During the period, the agent became > unresponsive, the health check timed out, and no endpoint responded as well. > When a disk-usage GC is trigged, around 300 pruning actors would be generated > (https://github.com/apache/mesos/blob/master/src/slave/gc.cpp#L229). My > hypothesis is that these actors would used all of the worker threads, and >
[jira] [Updated] (MESOS-7964) Heavy-duty GC makes the agent unresponsive
[ https://issues.apache.org/jira/browse/MESOS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun-Hung Hsiao updated MESOS-7964: --- Sprint: Mesosphere Sprint 63 > Heavy-duty GC makes the agent unresponsive > -- > > Key: MESOS-7964 > URL: https://issues.apache.org/jira/browse/MESOS-7964 > Project: Mesos > Issue Type: Bug >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao > Fix For: 1.4.1 > > > An agent is observed to performe heavy-duty GC every half an hour: > {noformat} > Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:15:56.900282 16054 slave.cpp:5920] Current disk > usage 93.61%. Max allowed age: 0ns > Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:15:56.900476 16054 gc.cpp:218] Pruning > directories with remaining removal time 1.99022105972148days > ... > Sep 07 18:22:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:22:08.173645 16050 gc.cpp:178] Deleted > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__f33065c9-eb42-44a7-9013-25bafc306bd5' > ... > Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:41:08.195329 16051 slave.cpp:5920] Current disk > usage 90.85%. Max allowed age: 0ns > Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:41:08.195503 16051 gc.cpp:218] Pruning > directories with remaining removal time 1.99028708946667days > ... > Sep 07 18:49:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:49:01.253906 16049 gc.cpp:178] Deleted > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__014b451a-30de-41ee-b0b1-3733c790382c/runs/c5b922e8-eee0-4793-8637-7abbd7f8507e' > ... > Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:08:01.291092 16048 slave.cpp:5920] Current disk > usage 91.39%. Max allowed age: 0ns > Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:08:01.291285 16048 gc.cpp:218] Pruning > directories with remaining removal time 1.99028598086815days > ... > Sep 07 19:14:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: W0907 19:14:50.737226 16050 gc.cpp:174] Failed to delete > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__4139bf2e-e33b-4743-8527-f8f50ac49280/runs/b1991e28-7ff8-476f-8122-1a483e431ff2': > No such file or directory > ... > Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:33:50.758191 16052 slave.cpp:5920] Current disk > usage 91.39%. Max allowed age: 0ns > Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:33:50.758872 16047 gc.cpp:218] Pruning > directories with remaining removal time 1.99028057238519days > ... > Sep 07 19:39:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:39:43.081485 16052 gc.cpp:178] Deleted > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__d89dce1f-609b-4cf8-957a-5ba198be7828' > ... > Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:59:43.150535 16048 slave.cpp:5920] Current disk > usage 94.56%. Max allowed age: 0ns > Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:59:43.150869 16054 gc.cpp:218] Pruning > directories with remaining removal time 1.98959316198222days > ... > Sep 07 20:06:16 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 20:06:16.251552 16051 gc.cpp:178] Deleted > '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0259/executors/data__45283e7d-9a5e-4d4b-9901-b7f1e096cd54/runs/5cfc5e3e-3975-41aa-846b-c125eb529fbe' > {noformat} > Each GC activity took 5+ minutes. During the period, the agent became > unresponsive, the health check timed out, and no endpoint responded as well. > When a disk-usage GC is trigged, around 300 pruning actors would be generated > (https://github.com/apache/mesos/blob/master/src/slave/gc.cpp#L229). My > hypothesis is that these actors would used all of the worker threads, and > some of them took
[jira] [Updated] (MESOS-7964) Heavy-duty GC makes the agent unresponsive
[ https://issues.apache.org/jira/browse/MESOS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun-Hung Hsiao updated MESOS-7964: --- Story Points: 2 Fix Version/s: 1.4.1 > Heavy-duty GC makes the agent unresponsive > -- > > Key: MESOS-7964 > URL: https://issues.apache.org/jira/browse/MESOS-7964 > Project: Mesos > Issue Type: Bug >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao > Fix For: 1.4.1 > > > An agent is observed to performe heavy-duty GC every half an hour: > {noformat} > Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:15:56.900282 16054 slave.cpp:5920] Current disk > usage 93.61%. Max allowed age: 0ns > Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:15:56.900476 16054 gc.cpp:218] Pruning > directories with remaining removal time 1.99022105972148days > ... > Sep 07 18:22:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:22:08.173645 16050 gc.cpp:178] Deleted > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__f33065c9-eb42-44a7-9013-25bafc306bd5' > ... > Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:41:08.195329 16051 slave.cpp:5920] Current disk > usage 90.85%. Max allowed age: 0ns > Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:41:08.195503 16051 gc.cpp:218] Pruning > directories with remaining removal time 1.99028708946667days > ... > Sep 07 18:49:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:49:01.253906 16049 gc.cpp:178] Deleted > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__014b451a-30de-41ee-b0b1-3733c790382c/runs/c5b922e8-eee0-4793-8637-7abbd7f8507e' > ... > Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:08:01.291092 16048 slave.cpp:5920] Current disk > usage 91.39%. Max allowed age: 0ns > Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:08:01.291285 16048 gc.cpp:218] Pruning > directories with remaining removal time 1.99028598086815days > ... > Sep 07 19:14:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: W0907 19:14:50.737226 16050 gc.cpp:174] Failed to delete > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__4139bf2e-e33b-4743-8527-f8f50ac49280/runs/b1991e28-7ff8-476f-8122-1a483e431ff2': > No such file or directory > ... > Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:33:50.758191 16052 slave.cpp:5920] Current disk > usage 91.39%. Max allowed age: 0ns > Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:33:50.758872 16047 gc.cpp:218] Pruning > directories with remaining removal time 1.99028057238519days > ... > Sep 07 19:39:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:39:43.081485 16052 gc.cpp:178] Deleted > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__d89dce1f-609b-4cf8-957a-5ba198be7828' > ... > Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:59:43.150535 16048 slave.cpp:5920] Current disk > usage 94.56%. Max allowed age: 0ns > Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:59:43.150869 16054 gc.cpp:218] Pruning > directories with remaining removal time 1.98959316198222days > ... > Sep 07 20:06:16 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 20:06:16.251552 16051 gc.cpp:178] Deleted > '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0259/executors/data__45283e7d-9a5e-4d4b-9901-b7f1e096cd54/runs/5cfc5e3e-3975-41aa-846b-c125eb529fbe' > {noformat} > Each GC activity took 5+ minutes. During the period, the agent became > unresponsive, the health check timed out, and no endpoint responded as well. > When a disk-usage GC is trigged, around 300 pruning actors would be generated > (https://github.com/apache/mesos/blob/master/src/slave/gc.cpp#L229). My > hypothesis is that these actors would used all of the worker threads, and > some
[jira] [Created] (MESOS-7965) WebUI folder not set in CMake
Andrew Schwartzmeyer created MESOS-7965: --- Summary: WebUI folder not set in CMake Key: MESOS-7965 URL: https://issues.apache.org/jira/browse/MESOS-7965 Project: Mesos Issue Type: Bug Components: cmake, webui Environment: Any build using CMake. Reporter: Andrew Schwartzmeyer The default directory for the WebUI assets is not correctly set in CMake builds. While a user can work around this via {{./src/mesos-master --webui_dir=../src/webui}}, ideally the default would "just work." {noformat} src/master/flags.cpp 166: add(&Flags::webui_dir, 167: "webui_dir", 168: "Directory path of the webui files/assets", 169: PKGDATADIR "/webui"); 199: "Human readable name for the cluster, displayed in the webui."); {noformat} We currently set {{PKGDATADIR}} to something not quite right which ends up with the default search path for {{webui}} assets not working. {noformat} cmake/CompilationConfigure.cmake 351: -DPKGDATADIR="${DATA_INSTALL_PREFIX}") cmake/CompilationConfigure.cmake 246: set(DATA_INSTALL_PREFIX ${SHARE_INSTALL_PREFIX}/mesos) 351: -DPKGDATADIR="${DATA_INSTALL_PREFIX}") cmake/CompilationConfigure.cmake 245: set(SHARE_INSTALL_PREFIX ${CMAKE_INSTALL_PREFIX}/share) 246: set(DATA_INSTALL_PREFIX ${SHARE_INSTALL_PREFIX}/mesos) cmake/CompilationConfigure.cmake 321: set(EXEC_INSTALL_PREFIX "WARNINGDONOTUSEME") 322: set(LIBEXEC_INSTALL_DIR "WARNINGDONOTUSEME") 323: set(PKG_LIBEXEC_INSTALL_DIR "WARNINGDONOTUSEME") 324: set(LIB_INSTALL_DIR "WARNINGDONOTUSEME") 325: set(TEST_LIB_EXEC_DIR "WARNINGDONOTUSEME") 326: set(PKG_MODULE_DIR "WARNINGDONOTUSEME") 327: set(S_BIN_DIR "WARNINGDONOTUSEME") {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-7833) stderr/stdout logs are failing to be served to Marathon
[ https://issues.apache.org/jira/browse/MESOS-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Schwartzmeyer reassigned MESOS-7833: --- Assignee: John Kordich > stderr/stdout logs are failing to be served to Marathon > --- > > Key: MESOS-7833 > URL: https://issues.apache.org/jira/browse/MESOS-7833 > Project: Mesos > Issue Type: Bug > Environment: Windows 10 mesos-agent using the Mesos Containerizer > CentOS 7 Marathon + mesos-master + zookeeper > Deployed following [this > guide|https://github.com/Microsoft/mesos-log/blob/master/notes/deployment.md]. >Reporter: Andrew Schwartzmeyer >Assignee: John Kordich > Labels: microsoft, windows > > Given an app in Marathon with the command {{powershell -noexit -c > get-process}}, we expect it to deploy, and the "Error Log" and "Output Log" > of the running instance to return the {{stderr}} and {{stdout}} files from > the agent. > While the files exist on the agent with the appropriate contents, e.g. > {{work_dir\slaves\ff198863-667e-46b9-a64d-e22fdff3b3cb-S4\frameworks\ff198863-667e-46b9-a64d-e22fdff3b3cb-\executors\get-process.4211c4e3-7181-11e7-b702-00155dafc802\runs\7fc924b4-4ec1-4be6-9386-d4f7cc17d5ad}} > has {{stderr}} and {{stdout}}, and the latter has the output of > {{get-process}}, Marathon is unable to retrieve them. > Clicking the link for the instance returns the error: "Sorry there was a > problem retrieving file. Click to retry." > The Mesos master is receiving the request {{I0725 14:54:49.627329 226319 > http.cpp:1133] HTTP GET for /master/state?jsonp=jsonp_15d7bbed282 from > 10.123.175.200:55885 ...}}, but no further logging is displayed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7964) Heavy-duty GC makes the agent unresponsive
[ https://issues.apache.org/jira/browse/MESOS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun-Hung Hsiao updated MESOS-7964: --- Shepherd: Yan Xu > Heavy-duty GC makes the agent unresponsive > -- > > Key: MESOS-7964 > URL: https://issues.apache.org/jira/browse/MESOS-7964 > Project: Mesos > Issue Type: Bug >Reporter: Chun-Hung Hsiao >Assignee: Chun-Hung Hsiao > > An agent is observed to performe heavy-duty GC every half an hour: > {noformat} > Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:15:56.900282 16054 slave.cpp:5920] Current disk > usage 93.61%. Max allowed age: 0ns > Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:15:56.900476 16054 gc.cpp:218] Pruning > directories with remaining removal time 1.99022105972148days > ... > Sep 07 18:22:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:22:08.173645 16050 gc.cpp:178] Deleted > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__f33065c9-eb42-44a7-9013-25bafc306bd5' > ... > Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:41:08.195329 16051 slave.cpp:5920] Current disk > usage 90.85%. Max allowed age: 0ns > Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:41:08.195503 16051 gc.cpp:218] Pruning > directories with remaining removal time 1.99028708946667days > ... > Sep 07 18:49:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 18:49:01.253906 16049 gc.cpp:178] Deleted > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__014b451a-30de-41ee-b0b1-3733c790382c/runs/c5b922e8-eee0-4793-8637-7abbd7f8507e' > ... > Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:08:01.291092 16048 slave.cpp:5920] Current disk > usage 91.39%. Max allowed age: 0ns > Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:08:01.291285 16048 gc.cpp:218] Pruning > directories with remaining removal time 1.99028598086815days > ... > Sep 07 19:14:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: W0907 19:14:50.737226 16050 gc.cpp:174] Failed to delete > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__4139bf2e-e33b-4743-8527-f8f50ac49280/runs/b1991e28-7ff8-476f-8122-1a483e431ff2': > No such file or directory > ... > Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:33:50.758191 16052 slave.cpp:5920] Current disk > usage 91.39%. Max allowed age: 0ns > Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:33:50.758872 16047 gc.cpp:218] Pruning > directories with remaining removal time 1.99028057238519days > ... > Sep 07 19:39:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:39:43.081485 16052 gc.cpp:178] Deleted > '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__d89dce1f-609b-4cf8-957a-5ba198be7828' > ... > Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:59:43.150535 16048 slave.cpp:5920] Current disk > usage 94.56%. Max allowed age: 0ns > Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 19:59:43.150869 16054 gc.cpp:218] Pruning > directories with remaining removal time 1.98959316198222days > ... > Sep 07 20:06:16 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com > mesos-agent[16040]: I0907 20:06:16.251552 16051 gc.cpp:178] Deleted > '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0259/executors/data__45283e7d-9a5e-4d4b-9901-b7f1e096cd54/runs/5cfc5e3e-3975-41aa-846b-c125eb529fbe' > {noformat} > Each GC activity took 5+ minutes. During the period, the agent became > unresponsive, the health check timed out, and no endpoint responded as well. > When a disk-usage GC is trigged, around 300 pruning actors would be generated > (https://github.com/apache/mesos/blob/master/src/slave/gc.cpp#L229). My > hypothesis is that these actors would used all of the worker threads, and > some of them took a long time to finish (possibly due to many
[jira] [Created] (MESOS-7964) Heavy-duty GC makes the agent unresponsive
Chun-Hung Hsiao created MESOS-7964: -- Summary: Heavy-duty GC makes the agent unresponsive Key: MESOS-7964 URL: https://issues.apache.org/jira/browse/MESOS-7964 Project: Mesos Issue Type: Bug Reporter: Chun-Hung Hsiao Assignee: Chun-Hung Hsiao An agent is observed to performe heavy-duty GC every half an hour: {noformat} Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com mesos-agent[16040]: I0907 18:15:56.900282 16054 slave.cpp:5920] Current disk usage 93.61%. Max allowed age: 0ns Sep 07 18:15:56 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com mesos-agent[16040]: I0907 18:15:56.900476 16054 gc.cpp:218] Pruning directories with remaining removal time 1.99022105972148days ... Sep 07 18:22:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com mesos-agent[16040]: I0907 18:22:08.173645 16050 gc.cpp:178] Deleted '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__f33065c9-eb42-44a7-9013-25bafc306bd5' ... Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com mesos-agent[16040]: I0907 18:41:08.195329 16051 slave.cpp:5920] Current disk usage 90.85%. Max allowed age: 0ns Sep 07 18:41:08 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com mesos-agent[16040]: I0907 18:41:08.195503 16051 gc.cpp:218] Pruning directories with remaining removal time 1.99028708946667days ... Sep 07 18:49:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com mesos-agent[16040]: I0907 18:49:01.253906 16049 gc.cpp:178] Deleted '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__014b451a-30de-41ee-b0b1-3733c790382c/runs/c5b922e8-eee0-4793-8637-7abbd7f8507e' ... Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com mesos-agent[16040]: I0907 19:08:01.291092 16048 slave.cpp:5920] Current disk usage 91.39%. Max allowed age: 0ns Sep 07 19:08:01 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com mesos-agent[16040]: I0907 19:08:01.291285 16048 gc.cpp:218] Pruning directories with remaining removal time 1.99028598086815days ... Sep 07 19:14:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com mesos-agent[16040]: W0907 19:14:50.737226 16050 gc.cpp:174] Failed to delete '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__4139bf2e-e33b-4743-8527-f8f50ac49280/runs/b1991e28-7ff8-476f-8122-1a483e431ff2': No such file or directory ... Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com mesos-agent[16040]: I0907 19:33:50.758191 16052 slave.cpp:5920] Current disk usage 91.39%. Max allowed age: 0ns Sep 07 19:33:50 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com mesos-agent[16040]: I0907 19:33:50.758872 16047 gc.cpp:218] Pruning directories with remaining removal time 1.99028057238519days ... Sep 07 19:39:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com mesos-agent[16040]: I0907 19:39:43.081485 16052 gc.cpp:178] Deleted '/var/lib/mesos/slave/meta/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0258/executors/node__d89dce1f-609b-4cf8-957a-5ba198be7828' ... Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com mesos-agent[16040]: I0907 19:59:43.150535 16048 slave.cpp:5920] Current disk usage 94.56%. Max allowed age: 0ns Sep 07 19:59:43 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com mesos-agent[16040]: I0907 19:59:43.150869 16054 gc.cpp:218] Pruning directories with remaining removal time 1.98959316198222days ... Sep 07 20:06:16 int-infinityagentm42xl6-soak110.us-east-1a.mesosphere.com mesos-agent[16040]: I0907 20:06:16.251552 16051 gc.cpp:178] Deleted '/var/lib/mesos/slave/slaves/9d4b2f2b-a759-4458-bebf-7d3507a6f0ca-S20/frameworks/9750f9be-89d9-4e02-80d3-bdced653e9c3-0259/executors/data__45283e7d-9a5e-4d4b-9901-b7f1e096cd54/runs/5cfc5e3e-3975-41aa-846b-c125eb529fbe' {noformat} Each GC activity took 5+ minutes. During the period, the agent became unresponsive, the health check timed out, and no endpoint responded as well. When a disk-usage GC is trigged, around 300 pruning actors would be generated (https://github.com/apache/mesos/blob/master/src/slave/gc.cpp#L229). My hypothesis is that these actors would used all of the worker threads, and some of them took a long time to finish (possibly due to many files to delete, or too many fs operations at once, etc). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7963) Task groups can lose the container limitation status.
James Peach created MESOS-7963: -- Summary: Task groups can lose the container limitation status. Key: MESOS-7963 URL: https://issues.apache.org/jira/browse/MESOS-7963 Project: Mesos Issue Type: Bug Components: containerization, executor Reporter: James Peach If you run a single task in a task group and that task fails with a container limitation, that status update can be lost and only the executor failure will be reported to the framework. {noformat} exec /opt/mesos/bin/mesos-execute --content_type=json --master=jpeach.apple.com:5050 '--task_group={ "tasks": [ { "name": "7f141aca-55fe-4bb0-af4b-87f5ee26986a", "task_id": {"value" : "2866368d-7279-4657-b8eb-bf1d968e8ebf"}, "agent_id": {"value" : ""}, "resources": [{ "name": "cpus", "type": "SCALAR", "scalar": { "value": 0.2 } }, { "name": "mem", "type": "SCALAR", "scalar": { "value": 32 } }, { "name": "disk", "type": "SCALAR", "scalar": { "value": 2 } } ], "command": { "value": "sleep 2 ; /usr/bin/dd if=/dev/zero of=out.dat bs=1M count=64 ; sleep 1" } } ] }' I0911 11:48:01.480689 7340 scheduler.cpp:184] Version: 1.5.0 I0911 11:48:01.488868 7339 scheduler.cpp:470] New master detected at master@17.228.224.108:5050 Subscribed with ID aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010 Submitted task group with tasks [ 2866368d-7279-4657-b8eb-bf1d968e8ebf ] to agent 'aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-S0' Received status update TASK_RUNNING for task '2866368d-7279-4657-b8eb-bf1d968e8ebf' source: SOURCE_EXECUTOR Received status update TASK_FAILED for task '2866368d-7279-4657-b8eb-bf1d968e8ebf' message: 'Command terminated with signal Killed' source: SOURCE_EXECUTOR {noformat} However, the agent logs show that this failed with a memory limitation: {noformat} I0911 11:48:02.235818 7012 http.cpp:532] Processing call WAIT_NESTED_CONTAINER I0911 11:48:02.236395 7013 status_update_manager.cpp:323] Received status update TASK_RUNNING (UUID: 85e7a8e8-22a7-4561-9000-2cd6d93502d9) for task 2866368d-7279-4657-b8eb-bf1d968e8ebf of framework aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010 I0911 11:48:02.237083 7016 slave.cpp:4875] Forwarding the update TASK_RUNNING (UUID: 85e7a8e8-22a7-4561-9000-2cd6d93502d9) for task 2866368d-7279-4657-b8eb-bf1d968e8ebf of framework aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010 to master@17.228.224.108:5050 I0911 11:48:02.283661 7007 status_update_manager.cpp:395] Received status update acknowledgement (UUID: 85e7a8e8-22a7-4561-9000-2cd6d93502d9) for task 2866368d-7279-4657-b8eb-bf1d968e8ebf of framework aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010 I0911 11:48:04.771455 7014 memory.cpp:516] OOM detected for container 474388fe-43c3-4372-b903-eaca22740996 I0911 11:48:04.776445 7014 memory.cpp:556] Memory limit exceeded: Requested: 64MB Maximum Used: 64MB ... I0911 11:48:04.776943 7012 containerizer.cpp:2681] Container 474388fe-43c3-4372-b903-eaca22740996 has reached its limit for resource [{"name":"mem","scalar":{"value":64.0},"type":"SCALAR"}] and will be terminated {noformat} The following {{mesos-execute}} task will show the container limitation correctly: {noformat} exec /opt/mesos/bin/mesos-execute --content_type=json --master=jpeach.apple.com:5050 '--task_group={ "tasks": [ { "name": "37db08f6-4f0f-4ef6-97ee-b10a5c5cc211", "task_id": {"value" : "1372b2e2-c501-4e80-bcbd-1a5c5194e206"}, "agent_id": {"value" : ""}, "resources": [{ "name": "cpus", "type": "SCALAR", "scalar": { "value": 0.2 } }, { "name": "mem", "type": "SCALAR", "scalar": { "value": 32 } }], "command": { "value": "sleep 600" } }, { "name": "7247643c-5e4d-4b01-9839-e38db49f7f4d", "task_id": {"value" : "a7571608-3a53-4971-a187-41ed8be183ba"}, "agent_id": {"value" : ""}, "resources": [{ "name": "cpus", "type": "SCALAR", "scalar": { "value": 0.2 } }, { "name": "mem", "type": "SCALAR", "scalar": { "value": 32 } }, { "name": "disk", "type": "SCALAR",
[jira] [Created] (MESOS-7962) Display task state counters in the framework page of the webui.
Benjamin Mahler created MESOS-7962: -- Summary: Display task state counters in the framework page of the webui. Key: MESOS-7962 URL: https://issues.apache.org/jira/browse/MESOS-7962 Project: Mesos Issue Type: Improvement Components: webui Reporter: Benjamin Mahler Currently the webui displays task state counters across all frameworks on the home page, but it does not display the per-framework task state counters when you click in to a particular framework. We should add the task state counters to the per-framework page. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-1976) Sandbox browse UI has path which is not selectable
[ https://issues.apache.org/jira/browse/MESOS-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler reassigned MESOS-1976: -- Resolution: Fixed Assignee: haosdent Duplicated by and fixed in MESOS-7468. > Sandbox browse UI has path which is not selectable > -- > > Key: MESOS-1976 > URL: https://issues.apache.org/jira/browse/MESOS-1976 > Project: Mesos > Issue Type: Bug > Components: webui >Affects Versions: 0.20.1 >Reporter: Steven Schlansker >Assignee: haosdent >Priority: Minor > > The Sandbox UI displays the path being browsed as a series of links. It is > not possible to copy the path from this, it ends up being formatted as e.g. > {code} > mnt > mesos > slaves > 20141022-230146-2500085258-5050-1554-3 > frameworks > Singularity > executors > ci-discovery-singularity-bridge-steven.2014.10.21T21.00.04-1414092693380-2-10-us_west_2a > runs > 554eebb3-126d-42bd-95c2-aa8282b05522 > {code} > instead of the expected > {code} > /mnt/mesos/slaves/20141022-230146-2500085258-5050-1554-3/frameworks/Singularity/executors/ci-discovery-singularity-bridge-steven.2014.10.21T21.00.04-1414092693380-2-10-us_west_2a/runs/554eebb3-126d-42bd-95c2-aa8282b05522 > > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-4341) Break up used / allocated / available resources
[ https://issues.apache.org/jira/browse/MESOS-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161778#comment-16161778 ] Benjamin Mahler commented on MESOS-4341: This ticket combines too many things: {quote} Used / allocated naming is very inconsistent now. Sometimes allocated means allocated, sometimes "total available": http://imgur.com/zLEX5pU {quote} This was a bug, fixed here: https://github.com/apache/mesos/commit/65ecc0a7c87880f382c13062d4d21c0cd178c945 {quote} I propose to have the following: used / allocated for each framework and task user / allocated / total for each slave and for cluster as a whole Master now shows idle and offered, slaves have "slack" capacity too that might be used for preemptive tasks. Not sure if it should be visualized as well. It looks like fetching cluster-wide utilization requires fetching stats from each slave, so it can be added separately later. {quote} What I'm distilling from this is a request for showing utilization from the master pages, since we already display it per agent. We haven't done this since, as you said, the master does not have utilization information of the whole cluster at the current time. It's possible we could build something for this. I'll close this out, can you file a ticket with the specific request for displaying utilization from the cluster-wide pages? > Break up used / allocated / available resources > --- > > Key: MESOS-4341 > URL: https://issues.apache.org/jira/browse/MESOS-4341 > Project: Mesos > Issue Type: Improvement > Components: webui >Reporter: Ivan Babrou > Attachments: really.png, Screen Shot 2016-01-12 at 8.51.39 PM.png > > > Used / allocated naming is very inconsistent now. Sometimes allocated means > allocated, sometimes "total available": http://imgur.com/zLEX5pU > I propose to have the following: > * used / allocated for each framework and task > * user / allocated / total for each slave and for cluster as a whole > Master now shows idle and offered, slaves have "slack" capacity too that > might be used for preemptive tasks. Not sure if it should be visualized as > well. > It looks like fetching cluster-wide utilization requires fetching stats from > each slave, so it can be added separately later. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7961) Display task health in the webui.
Benjamin Mahler created MESOS-7961: -- Summary: Display task health in the webui. Key: MESOS-7961 URL: https://issues.apache.org/jira/browse/MESOS-7961 Project: Mesos Issue Type: Improvement Components: webui Reporter: Benjamin Mahler Currently the webui does not display task health based on the latest status update. Since this information is in the protobuf, it is within the webui's scope to display health information. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7867) Master doesn't handle scheduler driver downgrade from HTTP based to PID based
[ https://issues.apache.org/jira/browse/MESOS-7867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-7867: -- Shepherd: Anand Mazumdar > Master doesn't handle scheduler driver downgrade from HTTP based to PID based > - > > Key: MESOS-7867 > URL: https://issues.apache.org/jira/browse/MESOS-7867 > Project: Mesos > Issue Type: Bug > Components: master >Affects Versions: 1.3.0 >Reporter: Ilya Pronin >Assignee: Ilya Pronin > > When a framework upgrades from a PID based driver to an HTTP based driver, > master removes its per-framework-principal metrics ({{messages_received}} and > {{messages_processed}}) in {{Master::failoverFramework}}. When the same > framework downgrades back to a PID based driver, the master doesn't reinstate > those metrics. This causes a crash when the master receives a message from > the failed over framework and increments {{messages_received}} counter in > {{Master::visit(const MessageEvent&)}}. > {noformat} > I0807 18:17:45.713220 19095 master.cpp:2916] Framework > 70822e80-ca38-4470-916e-e6da073a4742- (TwitterScheduler) failed over > F0807 18:18:20.725908 19079 master.cpp:1451] Check failed: > metrics->frameworks.contains(principal.get()) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7960) Deprecate non virtual path browse/read for sandbox
Zhitao Li created MESOS-7960: Summary: Deprecate non virtual path browse/read for sandbox Key: MESOS-7960 URL: https://issues.apache.org/jira/browse/MESOS-7960 Project: Mesos Issue Type: Improvement Reporter: Zhitao Li Priority: Minor We added support to browse and read files in executor's latest sandbox run directory in Mesos-7899. We should remove support for physical path after Mesos 2.0 because it requires the {{work_dir}} and {{agent_id}}, which are not necessary to expose to frameworks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7634) OsTest.ChownNoAccess fails on s390x machines
[ https://issues.apache.org/jira/browse/MESOS-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161209#comment-16161209 ] Prajakta Bhatekar commented on MESOS-7634: -- Hi Vinod, Looks like the failure in OsTest.ChownNoAccess is due to wrong status set by FTS functions of gnulib.They are provided by gnutils(coreutils on Ubuntu) package and are resposible for traversing the directory structure For s390x, the status flag is set to FTS_DP(i.e PostOrder Directory) instead of expected FTS_DNR(Unreadable Directory) for a directory with 0 permissions. At line 47, in file 3rdparty/stout/include/stout/os/posix/chown.hpp ,"fts_info" status flag of FTSENT structure returned by fts_read() is set to 6(FTS_DP)-PostOrder directory . The expected value is 4(FTS_DNR) - Unreadable directory. Thus default case gets executed and an error is not returned causing assertion to fail. Investigating on why the status flag is set differently on x86 and s390x. > OsTest.ChownNoAccess fails on s390x machines > > > Key: MESOS-7634 > URL: https://issues.apache.org/jira/browse/MESOS-7634 > Project: Mesos > Issue Type: Bug >Reporter: Vinod Kone >Assignee: Nayana Thorat > > Running a custom branch of Mesos (with some fixes in docker build scripts for > s390x) on s390x based CI machines throws the following error when running > stout tests. > {code} > [ RUN ] OsTest.ChownNoAccess > ../../../../3rdparty/stout/tests/os_tests.cpp:839: Failure > Value of: os::chown(uid.get(), gid.get(), "one", true).isError() > Actual: false > Expected: true > ../../../../3rdparty/stout/tests/os_tests.cpp:840: Failure > Value of: os::chown(uid.get(), gid.get(), "one/two", true).isError() > Actual: false > {code} > One can repro this by building Mesos from my custom branch here: > https://github.com/vinodkone/mesos/tree/vinod/s390x -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7959) Missing documentation for registry garbage collection flags
[ https://issues.apache.org/jira/browse/MESOS-7959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Bannier updated MESOS-7959: Description: * The master registry garbage collection-related master flags ({{registry_gc_interval}}, {{registry_max_agent_age}}, {{registry_max_agent_count}}) appear to be missing from the documentation in e.g., {{configuration.md}}. We should add documentation for them, and audit the existing documentation for additional missing flags. was: The master registry garbage collection-related master flags (appear to be missing from the documentation in e.g., {{configuration.md}}. We should add documentation for them, and audit the existing documentation for additional missing flags. > Missing documentation for registry garbage collection flags > --- > > Key: MESOS-7959 > URL: https://issues.apache.org/jira/browse/MESOS-7959 > Project: Mesos > Issue Type: Documentation > Components: documentation, master >Reporter: Benjamin Bannier > Labels: mesosphere, newbie > > * The master registry garbage collection-related master flags > ({{registry_gc_interval}}, {{registry_max_agent_age}}, > {{registry_max_agent_count}}) appear to be missing from the documentation in > e.g., {{configuration.md}}. > We should add documentation for them, and audit the existing documentation > for additional missing flags. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7959) Missing documentation for registry garbage collection flags
[ https://issues.apache.org/jira/browse/MESOS-7959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Bannier updated MESOS-7959: Description: The master registry garbage collection-related master flags (appear to be missing from the documentation in e.g., {{configuration.md}}. We should add documentation for them, and audit the existing documentation for additional missing flags. was: The master registry garbage collection-related master flags appear to be missing from the documentation in e.g., {{configuration.md}}. We should add documentation for them, and audit the existing documentation for additional missing flags. > Missing documentation for registry garbage collection flags > --- > > Key: MESOS-7959 > URL: https://issues.apache.org/jira/browse/MESOS-7959 > Project: Mesos > Issue Type: Documentation > Components: documentation, master >Reporter: Benjamin Bannier > Labels: mesosphere, newbie > > The master registry garbage collection-related master flags (appear to be > missing from the documentation in e.g., {{configuration.md}}. > We should add documentation for them, and audit the existing documentation > for additional missing flags. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7959) Missing documentation for registry garbage collection flags
Benjamin Bannier created MESOS-7959: --- Summary: Missing documentation for registry garbage collection flags Key: MESOS-7959 URL: https://issues.apache.org/jira/browse/MESOS-7959 Project: Mesos Issue Type: Documentation Components: documentation, master Reporter: Benjamin Bannier The master registry garbage collection-related master flags appear to be missing from the documentation in e.g., {{configuration.md}}. We should add documentation for them, and audit the existing documentation for additional missing flags. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7924) Add a javascript linter to the webui.
[ https://issues.apache.org/jira/browse/MESOS-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161003#comment-16161003 ] Armand Grillet commented on MESOS-7924: --- I have worked on the review request linked above over the weekend. This is currently a WIP that I have published mainly to get your review concerning the file {{.eslintrc.js}}, this file defines which errors/warnings should the linter display thus a correctly opinionated review of it is important. I have created the one in the patch by running {{eslint --init}} and answering these questions without being too careful: {code} (web-ui) static (eslint) $ eslint --init ? How would you like to configure ESLint? Inspect your JavaScript file(s) ? Which file(s), path(s), or glob(s) should be examined? /Users/Armand/Code/apache-mesos/src/webui/master/static/js/app.js ? What format do you want your config file to be in? JavaScript ? Are you using ECMAScript 6 features? No ? Where will your code run? Browser ? Do you use CommonJS? No ? Do you use JSX? No {code} The rest of the patch is similar to what we do for our Python linter. This means adding a {{.gitignore}} to the repository which is not conventional but already done in {{src/python/cli_new}}, adding {{.virtualenv}} to the root {{.gitignore}} is also a possibility. > Add a javascript linter to the webui. > - > > Key: MESOS-7924 > URL: https://issues.apache.org/jira/browse/MESOS-7924 > Project: Mesos > Issue Type: Improvement > Components: webui >Reporter: Benjamin Mahler >Assignee: Armand Grillet > Labels: tech-debt > > As far as I can tell, javascript linters (e.g. ESLint) help catch some > functional errors as well, for example, we've made some "strict" mistakes a > few times that ESLint can catch: MESOS-6624, MESOS-7912. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (MESOS-7944) Implement jemalloc support for Mesos
[ https://issues.apache.org/jira/browse/MESOS-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161001#comment-16161001 ] Benno Evers edited comment on MESOS-7944 at 9/11/17 9:59 AM: - Since I've started to work on this, I have now a much better idea of what needs to be done. First of all, since the added features are not mesos-specific, I think it's best to add them directly to libprocess. However, the choice of preferred malloc should be up the binary, not enforced by a shared library, so instead compiling against jemalloc we should detect at runtime whether we're running under jemalloc or not. (similar to what folly does here: https://github.com/facebook/folly/blob/master/folly/Malloc.h#L150) At the endpoint, the minimum features I would like are the ability to get the (exact) heap allocation statistics as JSON, or download current (stochastic) heap profile dumps as files. Depending on the complexity of it, we should also think about providing a way to have the master dump profiles periodically and store them on disk, and a way to generate jeprof-graphs automatically. Finally, the new `--enable-memory-profiling` configure option (tentative name) for mesos would build a bundled version of jemalloc with all the necessary configuration options enabled, and link the mesos-master and mesos-slave binaries against this library. was (Author: bennoe): Since I've started to work on this, I have now a much sharper idea of what needs to be done. First of all, since the added features are not mesos-specific, I think it's best to add them directly to libprocess. However, the choice of preferred malloc should be up the binary, not enforced by a shared library, so instead compiling against jemalloc we should detect at runtime whether we're running under jemalloc or not. (similar to what folly does here: https://github.com/facebook/folly/blob/master/folly/Malloc.h#L150) At the endpoint, the minimum features I would like are the ability to get the (exact) heap allocation statistics as JSON, or download current (stochastic) heap profile dumps as files. Depending on the complexity of it, we should also think about providing a way to have the master dump profiles periodically and store them on disk, and a way to generate jeprof-graphs automatically. Finally, the new `--enable-memory-profiling` configure option (tentative name) for mesos would build a bundled version of jemalloc with all the necessary configuration options enabled, and link the mesos-master and mesos-slave binaries against this library. > Implement jemalloc support for Mesos > > > Key: MESOS-7944 > URL: https://issues.apache.org/jira/browse/MESOS-7944 > Project: Mesos > Issue Type: Bug >Reporter: Benno Evers >Assignee: Benno Evers > > After investigation in MESOS-7876 and discussion on the mailing list, this > task is for tracking progress on adding out-of-the-box memory profiling > support using jemalloc to Mesos. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7944) Implement jemalloc support for Mesos
[ https://issues.apache.org/jira/browse/MESOS-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161001#comment-16161001 ] Benno Evers commented on MESOS-7944: Since I've started to work on this, I have now a much sharper idea of what needs to be done. First of all, since the added features are not mesos-specific, I think it's best to add them directly to libprocess. However, the choice of preferred malloc should be up the binary, not enforced by a shared library, so instead compiling against jemalloc we should detect at runtime whether we're running under jemalloc or not. (similar to what folly does here: https://github.com/facebook/folly/blob/master/folly/Malloc.h#L150) At the endpoint, the minimum features I would like are the ability to get the (exact) heap allocation statistics as JSON, or download current (stochastic) heap profile dumps as files. Depending on the complexity of it, we should also think about providing a way to have the master dump profiles periodically and store them on disk, and a way to generate jeprof-graphs automatically. Finally, the new `--enable-memory-profiling` configure option (tentative name) for mesos would build a bundled version of jemalloc with all the necessary configuration options enabled, and link the mesos-master and mesos-slave binaries against this library. > Implement jemalloc support for Mesos > > > Key: MESOS-7944 > URL: https://issues.apache.org/jira/browse/MESOS-7944 > Project: Mesos > Issue Type: Bug >Reporter: Benno Evers >Assignee: Benno Evers > > After investigation in MESOS-7876 and discussion on the mailing list, this > task is for tracking progress on adding out-of-the-box memory profiling > support using jemalloc to Mesos. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-7924) Add a javascript linter to the webui.
[ https://issues.apache.org/jira/browse/MESOS-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Armand Grillet reassigned MESOS-7924: - Assignee: Armand Grillet > Add a javascript linter to the webui. > - > > Key: MESOS-7924 > URL: https://issues.apache.org/jira/browse/MESOS-7924 > Project: Mesos > Issue Type: Improvement > Components: webui >Reporter: Benjamin Mahler >Assignee: Armand Grillet > Labels: tech-debt > > As far as I can tell, javascript linters (e.g. ESLint) help catch some > functional errors as well, for example, we've made some "strict" mistakes a > few times that ESLint can catch: MESOS-6624, MESOS-7912. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7958) The example framework `test-framework` is broken.
[ https://issues.apache.org/jira/browse/MESOS-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160835#comment-16160835 ] Michael Park commented on MESOS-7958: - The example frameworks should work without having to be installed. The fact that it works with {{LD_LIBRARY_PATH}} set I think is a good thing to know, but it used to, and I think it should work without having to do that. > The example framework `test-framework` is broken. > - > > Key: MESOS-7958 > URL: https://issues.apache.org/jira/browse/MESOS-7958 > Project: Mesos > Issue Type: Bug > Components: framework >Reporter: Michael Park > Attachments: screenshot-1.png > > > The {{test-framework}} example framework does not work. > Launching a cluster like so: > {code} > MESOS_RESOURCES="cpus:32;mem:512;disk:1024" MESOS_REGISTRY="in_memory" > ./bin/mesos-local.sh --num_slaves=1 --ip=127.0.0.1 --port=4040 > --work_dir=$HOME/mesos-local > {code} > and trying to launch the {{test-framework}} like so: > {code} > ./src/test-framework --master=127.0.0.1:4040 > {code} > {code} > /home/mpark/projects/mesos/build/src/.libs/test-executor: error while loading > shared libraries: libmesos-1.5.0.so: cannot open shared object file: No such > file or directory > {code} > It seems that {{test-executor}} cannot load {{libmesos.so}} correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7958) The example framework `test-framework` is broken.
[ https://issues.apache.org/jira/browse/MESOS-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160831#comment-16160831 ] Christopher Ogle commented on MESOS-7958: - Yep, that worked. [~mcypark] would you maybe want to me to explore setting up a Troubleshooting section that tracks and attempts answer issues like these. > The example framework `test-framework` is broken. > - > > Key: MESOS-7958 > URL: https://issues.apache.org/jira/browse/MESOS-7958 > Project: Mesos > Issue Type: Bug > Components: framework >Reporter: Michael Park > Attachments: screenshot-1.png > > > The {{test-framework}} example framework does not work. > Launching a cluster like so: > {code} > MESOS_RESOURCES="cpus:32;mem:512;disk:1024" MESOS_REGISTRY="in_memory" > ./bin/mesos-local.sh --num_slaves=1 --ip=127.0.0.1 --port=4040 > --work_dir=$HOME/mesos-local > {code} > and trying to launch the {{test-framework}} like so: > {code} > ./src/test-framework --master=127.0.0.1:4040 > {code} > {code} > /home/mpark/projects/mesos/build/src/.libs/test-executor: error while loading > shared libraries: libmesos-1.5.0.so: cannot open shared object file: No such > file or directory > {code} > It seems that {{test-executor}} cannot load {{libmesos.so}} correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (MESOS-7958) The example framework `test-framework` is broken.
[ https://issues.apache.org/jira/browse/MESOS-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160831#comment-16160831 ] Christopher Ogle edited comment on MESOS-7958 at 9/11/17 7:23 AM: -- Yep, that worked. [~mcypark] would you maybe want to me to explore setting up a Troubleshooting section that tracks and attempts answer issues like these? was (Author: cogle): Yep, that worked. [~mcypark] would you maybe want to me to explore setting up a Troubleshooting section that tracks and attempts answer issues like these. > The example framework `test-framework` is broken. > - > > Key: MESOS-7958 > URL: https://issues.apache.org/jira/browse/MESOS-7958 > Project: Mesos > Issue Type: Bug > Components: framework >Reporter: Michael Park > Attachments: screenshot-1.png > > > The {{test-framework}} example framework does not work. > Launching a cluster like so: > {code} > MESOS_RESOURCES="cpus:32;mem:512;disk:1024" MESOS_REGISTRY="in_memory" > ./bin/mesos-local.sh --num_slaves=1 --ip=127.0.0.1 --port=4040 > --work_dir=$HOME/mesos-local > {code} > and trying to launch the {{test-framework}} like so: > {code} > ./src/test-framework --master=127.0.0.1:4040 > {code} > {code} > /home/mpark/projects/mesos/build/src/.libs/test-executor: error while loading > shared libraries: libmesos-1.5.0.so: cannot open shared object file: No such > file or directory > {code} > It seems that {{test-executor}} cannot load {{libmesos.so}} correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (MESOS-7958) The example framework `test-framework` is broken.
[ https://issues.apache.org/jira/browse/MESOS-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160831#comment-16160831 ] Christopher Ogle edited comment on MESOS-7958 at 9/11/17 7:23 AM: -- Yep, that worked, thanks. [~mcypark] would you maybe want to me to explore setting up a Troubleshooting section that tracks and attempts answer issues like these? was (Author: cogle): Yep, that worked. [~mcypark] would you maybe want to me to explore setting up a Troubleshooting section that tracks and attempts answer issues like these? > The example framework `test-framework` is broken. > - > > Key: MESOS-7958 > URL: https://issues.apache.org/jira/browse/MESOS-7958 > Project: Mesos > Issue Type: Bug > Components: framework >Reporter: Michael Park > Attachments: screenshot-1.png > > > The {{test-framework}} example framework does not work. > Launching a cluster like so: > {code} > MESOS_RESOURCES="cpus:32;mem:512;disk:1024" MESOS_REGISTRY="in_memory" > ./bin/mesos-local.sh --num_slaves=1 --ip=127.0.0.1 --port=4040 > --work_dir=$HOME/mesos-local > {code} > and trying to launch the {{test-framework}} like so: > {code} > ./src/test-framework --master=127.0.0.1:4040 > {code} > {code} > /home/mpark/projects/mesos/build/src/.libs/test-executor: error while loading > shared libraries: libmesos-1.5.0.so: cannot open shared object file: No such > file or directory > {code} > It seems that {{test-executor}} cannot load {{libmesos.so}} correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)