[jira] [Assigned] (MESOS-8057) Apply security patches to AngularJS and JQuery in the Mesos UI
[ https://issues.apache.org/jira/browse/MESOS-8057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rojas reassigned MESOS-8057: -- Assignee: Alexander Rojas > Apply security patches to AngularJS and JQuery in the Mesos UI > -- > > Key: MESOS-8057 > URL: https://issues.apache.org/jira/browse/MESOS-8057 > Project: Mesos > Issue Type: Bug > Components: webui >Affects Versions: 1.4.0 >Reporter: Alexander Rojas >Assignee: Alexander Rojas >Priority: Blocker > Labels: mesosphere > > Running a security tool returns: > {noformat} > Evidence > Vulnerable libraries were found: > https://admin.kpn-dsh.com/mesos/static/js/angular-1.2.3.min.js > https://admin.kpn-dsh.com/mesos/static/js/angular-route-1.2.3.min.js > https://admin.kpn-dsh.com/mesos/static/js/jquery-1.7.1.min.js > More information about the issues can be found at: - > https://github.com/angular/angular.js/blob/master/CHANGELOG.md - > http://bugs.jquery.com/ticket/11290 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8109) Broken markup in `Attaching containers to CNI networks`
Wilfried Goesgens created MESOS-8109: Summary: Broken markup in `Attaching containers to CNI networks` Key: MESOS-8109 URL: https://issues.apache.org/jira/browse/MESOS-8109 Project: Mesos Issue Type: Documentation Reporter: Wilfried Goesgens Priority: Trivial On http://mesos.apache.org/documentation/latest/cni/ under 'Attaching containers to CNI networks' the **NOTE** section is broken - it probably shouldn't be a verbatim box. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (MESOS-7594) Implement 'apply' for resource provider related operations
[ https://issues.apache.org/jira/browse/MESOS-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150341#comment-16150341 ] Jan Schlicht edited comment on MESOS-7594 at 10/18/17 2:35 PM: --- https://reviews.apache.org/r/63104/ https://reviews.apache.org/r/61810/ https://reviews.apache.org/r/61946/ https://reviews.apache.org/r/63105/ https://reviews.apache.org/r/61947/ was (Author: nfnt): https://reviews.apache.org/r/61810/ https://reviews.apache.org/r/61946/ https://reviews.apache.org/r/61947/ > Implement 'apply' for resource provider related operations > -- > > Key: MESOS-7594 > URL: https://issues.apache.org/jira/browse/MESOS-7594 > Project: Mesos > Issue Type: Task > Components: master >Reporter: Jan Schlicht >Assignee: Jan Schlicht > Labels: mesosphere, storage > > Resource providers provide new offer operations ({{CREATE_BLOCK}}, > {{DESTROY_BLOCK}}, {{CREATE_VOLUME}}, {{DESTROY_VOLUME}}). These operations > can be applied by frameworks when they accept on offer. Handling of these > operations has to be added to the master's {{accept}} call. I.e. the > corresponding resource provider needs be extracted from the offer's resources > and a {{resource_provider::Event::OPERATION}} has to be sent to the resource > provider. The resource provider will answer with a > {{resource_provider::Call::Update}} which needs to be handled as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Issue Comment Deleted] (MESOS-8057) Apply security patches to AngularJS and JQuery in the Mesos UI
[ https://issues.apache.org/jira/browse/MESOS-8057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rojas updated MESOS-8057: --- Comment: was deleted (was: Changes landed in Mesos master branch. They will be part of next Mesos bump on DC/OS {noformat} commit b0a660bb1811c0144cba781482b1ce4573e685b3 Author: Alexander Rojas AuthorDate: Wed Oct 18 12:11:05 2017 +0200 Commit: Alexander Rojas CommitDate: Wed Oct 18 16:33:19 2017 +0200 Upgrades jQuery used by Mesos WebUI to version 3.2.1. The version of jQuery distributed with Mesos (1.7.1) was found to have security issues which have been addressed in latter versions. Review: https://reviews.apache.org/r/63101 {noformat} {noformat} commit 1b5a4e77e55f5c8665526294626a66905569a284 (HEAD -> master, upstream/master) Author: Alexander Rojas AuthorDate: Wed Oct 18 12:11:40 2017 +0200 Commit: Alexander Rojas CommitDate: Wed Oct 18 16:33:37 2017 +0200 Upgrades AngularJS used by Mesos WebUI to version 1.2.32. The version of AngularJS distributed with Mesos (1.2.3) was found to have security issues which have been addressed in latter versions. Review: https://reviews.apache.org/r/63102 {noformat}) > Apply security patches to AngularJS and JQuery in the Mesos UI > -- > > Key: MESOS-8057 > URL: https://issues.apache.org/jira/browse/MESOS-8057 > Project: Mesos > Issue Type: Bug > Components: webui >Affects Versions: 1.4.0 >Reporter: Alexander Rojas >Assignee: Alexander Rojas >Priority: Blocker > Labels: mesosphere > Fix For: 1.5.0 > > > Running a security tool returns: > {noformat} > Evidence > Vulnerable libraries were found: > https://admin.kpn-dsh.com/mesos/static/js/angular-1.2.3.min.js > https://admin.kpn-dsh.com/mesos/static/js/angular-route-1.2.3.min.js > https://admin.kpn-dsh.com/mesos/static/js/jquery-1.7.1.min.js > More information about the issues can be found at: - > https://github.com/angular/angular.js/blob/master/CHANGELOG.md - > http://bugs.jquery.com/ticket/11290 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8057) Apply security patches to AngularJS and JQuery in the Mesos UI
[ https://issues.apache.org/jira/browse/MESOS-8057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16209438#comment-16209438 ] Alexander Rojas commented on MESOS-8057: Changes landed in Mesos master branch. They will be part of next Mesos bump on DC/OS {noformat} commit b0a660bb1811c0144cba781482b1ce4573e685b3 Author: Alexander Rojas AuthorDate: Wed Oct 18 12:11:05 2017 +0200 Commit: Alexander Rojas CommitDate: Wed Oct 18 16:33:19 2017 +0200 Upgrades jQuery used by Mesos WebUI to version 3.2.1. The version of jQuery distributed with Mesos (1.7.1) was found to have security issues which have been addressed in latter versions. Review: https://reviews.apache.org/r/63101 {noformat} {noformat} commit 1b5a4e77e55f5c8665526294626a66905569a284 (HEAD -> master, upstream/master) Author: Alexander Rojas AuthorDate: Wed Oct 18 12:11:40 2017 +0200 Commit: Alexander Rojas CommitDate: Wed Oct 18 16:33:37 2017 +0200 Upgrades AngularJS used by Mesos WebUI to version 1.2.32. The version of AngularJS distributed with Mesos (1.2.3) was found to have security issues which have been addressed in latter versions. Review: https://reviews.apache.org/r/63102 {noformat} > Apply security patches to AngularJS and JQuery in the Mesos UI > -- > > Key: MESOS-8057 > URL: https://issues.apache.org/jira/browse/MESOS-8057 > Project: Mesos > Issue Type: Bug > Components: webui >Affects Versions: 1.4.0 >Reporter: Alexander Rojas >Assignee: Alexander Rojas >Priority: Blocker > Labels: mesosphere > Fix For: 1.5.0 > > > Running a security tool returns: > {noformat} > Evidence > Vulnerable libraries were found: > https://admin.kpn-dsh.com/mesos/static/js/angular-1.2.3.min.js > https://admin.kpn-dsh.com/mesos/static/js/angular-route-1.2.3.min.js > https://admin.kpn-dsh.com/mesos/static/js/jquery-1.7.1.min.js > More information about the issues can be found at: - > https://github.com/angular/angular.js/blob/master/CHANGELOG.md - > http://bugs.jquery.com/ticket/11290 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8110) Mesos Maintenance UI not rendering End time correctly
Vishnu Mohan created MESOS-8110: --- Summary: Mesos Maintenance UI not rendering End time correctly Key: MESOS-8110 URL: https://issues.apache.org/jira/browse/MESOS-8110 Project: Mesos Issue Type: Bug Components: webui Affects Versions: 1.4.0 Reporter: Vishnu Mohan The {{Begin}} time (e.g., {{2017-10-18T10:54:45-0400}}) and {{End}} time (e.g., {{2017-10-18T11:54:45-0400}}) are both rendered as {{just now}} when a maintenance window is initially POST'ed (even though they're an hour apart) and the {{End}} time never updates although the human-friendly (relative) {{Begin}} time does. These scripts may be used to reproduce the issue: https://github.com/vishnu2kmohan/dcos-toolbox/blob/master/mesos/maintain-agents.sh https://github.com/vishnu2kmohan/dcos-toolbox/blob/master/mesos/agent-maintenance-schedule-example.json -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7506) Multiple tests leave orphan containers.
[ https://issues.apache.org/jira/browse/MESOS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16209541#comment-16209541 ] Andrei Budnik commented on MESOS-7506: -- All failing tests have the same error message in logs like: {{E0922 00:38:40.509032 31034 slave.cpp:5398] Termination of executor '1' of framework 83bd1613-70d9-4c3e-b490-4aa60dd26e22- failed: Failed to kill all processes in the container: Timed out after 1mins}} The container termination future is triggered by [MesosContainerizerProcess::___destroy|https://github.com/apache/mesos/blob/b361801f2c78043459199dab3e0defe9a0b4c1aa/src/slave/containerizer/mesos/containerizer.cpp#L2361]. Agent subscribes to this future by calling [containerizer->wait()|https://github.com/apache/mesos/blob/b361801f2c78043459199dab3e0defe9a0b4c1aa/src/slave/slave.cpp#L5280]. Triggering this future leads to calling of {{Slave::executorTerminated}}, which sends {{TASK_FAILED}} status update. Typical test (e.g. {{SlaveTest.ShutdownUnregisteredExecutor}}) waits for {code} // Ensure that the slave times out and kills the executor. Future destroyExecutor = FUTURE_DISPATCH(_, &MesosContainerizerProcess::destroy); {code} After that, the test waits for {{TASK_FAILED}} status update. So, this test completes successfully and slave's destructor is called, [which fails|https://github.com/apache/mesos/blob/b361801f2c78043459199dab3e0defe9a0b4c1aa/src/tests/cluster.cpp#L580], because {{MesosContainerizerProcess::___destroy}} doesn't erase container from the hashmap. > Multiple tests leave orphan containers. > --- > > Key: MESOS-7506 > URL: https://issues.apache.org/jira/browse/MESOS-7506 > Project: Mesos > Issue Type: Bug > Components: containerization > Environment: Ubuntu 16.04 > Fedora 23 > other Linux distros >Reporter: Alexander Rukletsov >Assignee: Andrei Budnik > Labels: containerizer, flaky-test, mesosphere > > I've observed a number of flaky tests that leave orphan containers upon > cleanup. A typical log looks like this: > {noformat} > ../../src/tests/cluster.cpp:580: Failure > Value of: containers->empty() > Actual: false > Expected: true > Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 } > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8111) Mesos sees task as running, but cannot kill it because the agent is offline
Cosmin Lehene created MESOS-8111: Summary: Mesos sees task as running, but cannot kill it because the agent is offline Key: MESOS-8111 URL: https://issues.apache.org/jira/browse/MESOS-8111 Project: Mesos Issue Type: Bug Components: master Affects Versions: 1.2.3 Environment: DC/OS 1.9.4 Reporter: Cosmin Lehene After scaling down a cluster, the master is reporting a task as running although the slave has been long gone. At the same time it reports it can't kill it because the agent is offline {noformat} I1018 16:55:22.00 6976 master.cpp:4913] Processing KILL call for task 'spark.7b59a77b-b353-11e7-addd-b29ecbf071e1' of framework 4d2a982a-0e62-4471-88e8-8df9cc0ae437-0001 (marathon) at scheduler-45eafb76-4510-482e-9bcc-06e3ad97c276@172.16.0.7:15101 W1018 16:55:22.00 6976 master.cpp:5000] Cannot kill task spark.7b59a77b-b353-11e7-addd-b29ecbf071e1 of framework 4d2a982a-0e62-4471-88e8-8df9cc0ae437-0001 (marathon) at scheduler-45eafb76-4510-482e-9bcc-06e3ad97c276@172.16.0.7:15101 because the agent 4d2a982a-0e62-4471-88e8-8df9cc0ae437-S129 at slave(1)@10.0.0.81:5051 (10.0.0.81) is disconnected. Kill will be retried if the agent re-registers {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8111) Mesos sees task as running, but cannot kill it because the agent is offline
[ https://issues.apache.org/jira/browse/MESOS-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cosmin Lehene updated MESOS-8111: - Description: After scaling down a cluster, the master is reporting a task as running although the slave has been long gone. At the same time it reports it can't kill it because the agent is offline {noformat} I1018 16:55:22.00 6976 master.cpp:4913] Processing KILL call for task 'spark.7b59a77b-b353-11e7-addd-b29ecbf071e1' of framework 4d2a982a-0e62-4471-88e8-8df9cc0ae437-0001 (marathon) at scheduler-45eafb76-4510-482e-9bcc-06e3ad97c276@172.16.0.7:15101 W1018 16:55:22.00 6976 master.cpp:5000] Cannot kill task spark.7b59a77b-b353-11e7-addd-b29ecbf071e1 of framework 4d2a982a-0e62-4471-88e8-8df9cc0ae437-0001 (marathon) at scheduler-45eafb76-4510-482e-9bcc-06e3ad97c276@172.16.0.7:15101 because the agent 4d2a982a-0e62-4471-88e8-8df9cc0ae437-S129 at slave(1)@10.0.0.81:5051 (10.0.0.81) is disconnected. Kill will be retried if the agent re-registers {noformat} Clearly, if the agent is offline the task is also not running. Also not sure waiting indefinitely for an agent to recover is a good strategy. was: After scaling down a cluster, the master is reporting a task as running although the slave has been long gone. At the same time it reports it can't kill it because the agent is offline {noformat} I1018 16:55:22.00 6976 master.cpp:4913] Processing KILL call for task 'spark.7b59a77b-b353-11e7-addd-b29ecbf071e1' of framework 4d2a982a-0e62-4471-88e8-8df9cc0ae437-0001 (marathon) at scheduler-45eafb76-4510-482e-9bcc-06e3ad97c276@172.16.0.7:15101 W1018 16:55:22.00 6976 master.cpp:5000] Cannot kill task spark.7b59a77b-b353-11e7-addd-b29ecbf071e1 of framework 4d2a982a-0e62-4471-88e8-8df9cc0ae437-0001 (marathon) at scheduler-45eafb76-4510-482e-9bcc-06e3ad97c276@172.16.0.7:15101 because the agent 4d2a982a-0e62-4471-88e8-8df9cc0ae437-S129 at slave(1)@10.0.0.81:5051 (10.0.0.81) is disconnected. Kill will be retried if the agent re-registers {noformat} > Mesos sees task as running, but cannot kill it because the agent is offline > --- > > Key: MESOS-8111 > URL: https://issues.apache.org/jira/browse/MESOS-8111 > Project: Mesos > Issue Type: Bug > Components: master >Affects Versions: 1.2.3 > Environment: DC/OS 1.9.4 >Reporter: Cosmin Lehene > > After scaling down a cluster, the master is reporting a task as running > although the slave has been long gone. > At the same time it reports it can't kill it because the agent is offline > {noformat} > I1018 16:55:22.00 6976 master.cpp:4913] Processing KILL call for task > 'spark.7b59a77b-b353-11e7-addd-b29ecbf071e1' of framework > 4d2a982a-0e62-4471-88e8-8df9cc0ae437-0001 (marathon) at > scheduler-45eafb76-4510-482e-9bcc-06e3ad97c276@172.16.0.7:15101 > W1018 16:55:22.00 6976 master.cpp:5000] Cannot kill task > spark.7b59a77b-b353-11e7-addd-b29ecbf071e1 of framework > 4d2a982a-0e62-4471-88e8-8df9cc0ae437-0001 (marathon) at > scheduler-45eafb76-4510-482e-9bcc-06e3ad97c276@172.16.0.7:15101 because the > agent 4d2a982a-0e62-4471-88e8-8df9cc0ae437-S129 at slave(1)@10.0.0.81:5051 > (10.0.0.81) is disconnected. Kill will be retried if the agent re-registers > {noformat} > Clearly, if the agent is offline the task is also not running. Also not sure > waiting indefinitely for an agent to recover is a good strategy. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7941) Send TASK_STARTING status from built-in executors
[ https://issues.apache.org/jira/browse/MESOS-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16209853#comment-16209853 ] Alexander Rukletsov commented on MESOS-7941: Reverting {{f43710eabb1c0956b368e9f855b26bebcf8cbc7a}} and {{1e1e409b3906d1a6189d5dfd47b21df7680244f6}} due to failing tests. > Send TASK_STARTING status from built-in executors > - > > Key: MESOS-7941 > URL: https://issues.apache.org/jira/browse/MESOS-7941 > Project: Mesos > Issue Type: Improvement > Components: executor >Reporter: Benno Evers >Assignee: Benno Evers > Labels: executor, executors > Fix For: 1.5.0 > > > All executors have the option to send out a TASK_STARTING status update to > signal to the scheduler that they received the command to launch the task. > It would be good if our built-in executors would do this, for reasons laid > out in > https://mail-archives.apache.org/mod_mbox/mesos-dev/201708.mbox/%3CCA%2B9TLTzkEVM0CKvY%2B%3D0%3DwjrN6hYFAt0401Y7b8tysDWx1WZzdw%40mail.gmail.com%3E > This will also fix MESOS-6790. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8112) DefaultExecutorTest.ResourceLimitation is flaky
James Peach created MESOS-8112: -- Summary: DefaultExecutorTest.ResourceLimitation is flaky Key: MESOS-8112 URL: https://issues.apache.org/jira/browse/MESOS-8112 Project: Mesos Issue Type: Bug Components: flaky, test Reporter: James Peach As seen in CI builds, the {{MesosContainerizer/DefaultExecutorTest.ResourceLimitation/0}} test can be flaky {noformat}[ RUN ] MesosContainerizer/DefaultExecutorTest.ResourceLimitation/0 I1017 21:37:55.179539 3528 cluster.cpp:162] Creating default 'local' authorizer I1017 21:37:55.182804 3529 master.cpp:445] Master 0a7cd77c-8bc0-4fdc-b6c5-918b7ffc392e (42cd332f4072) started on 172.17.0.2:33744 I1017 21:37:55.182847 3529 master.cpp:447] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/1FtpuJ/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-1.5.0/_inst/share/mesos/webui" --work_dir="/tmp/1FtpuJ/master" --zk_session_timeout="10secs" I1017 21:37:55.183141 3529 master.cpp:496] Master only allowing authenticated frameworks to register I1017 21:37:55.183153 3529 master.cpp:502] Master only allowing authenticated agents to register I1017 21:37:55.183161 3529 master.cpp:508] Master only allowing authenticated HTTP frameworks to register I1017 21:37:55.183167 3529 credentials.hpp:37] Loading credentials for authentication from '/tmp/1FtpuJ/credentials' I1017 21:37:55.183472 3529 master.cpp:552] Using default 'crammd5' authenticator I1017 21:37:55.183661 3529 http.cpp:1045] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I1017 21:37:55.183862 3529 http.cpp:1045] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I1017 21:37:55.184082 3529 http.cpp:1045] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I1017 21:37:55.184257 3529 master.cpp:631] Authorization enabled I1017 21:37:55.184450 3536 hierarchical.cpp:171] Initialized hierarchical allocator process I1017 21:37:55.184551 3536 whitelist_watcher.cpp:77] No whitelist given I1017 21:37:55.187489 3536 master.cpp:2198] Elected as the leading master! I1017 21:37:55.187516 3536 master.cpp:1687] Recovering from registrar I1017 21:37:55.187728 3536 registrar.cpp:347] Recovering registrar I1017 21:37:55.188508 3536 registrar.cpp:391] Successfully fetched the registry (0B) in 745984ns I1017 21:37:55.188616 3536 registrar.cpp:495] Applied 1 operations in 37290ns; attempting to update the registry I1017 21:37:55.189162 3536 registrar.cpp:552] Successfully updated the registry in 491008ns I1017 21:37:55.189285 3536 registrar.cpp:424] Successfully recovered registrar I1017 21:37:55.190011 3531 hierarchical.cpp:209] Skipping recovery of hierarchical allocator: nothing to recover I1017 21:37:55.190115 3534 master.cpp:1791] Recovered 0 agents from the registry (129B); allowing 10mins for agents to re-register W1017 21:37:55.195062 3528 process.cpp:3194] Attempted to spawn already running process files@172.17.0.2:33744 I1017 21:37:55.195956 3528 containerizer.cpp:292] Using isolation { environment_secret, network/cni, filesystem/posix, disk/du } W1017 21:37:55.196488 3528 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges W1017 21:37:55.196630 3528 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges I1017 21:37:55.196662 3528 provisioner.cpp:255] Using default backend 'copy' I1017 21:37:55.198724 3528 cluster.cpp:448] Creating default 'local' authorizer I1017 21:37:55.200865 3535 slave.cpp:254] Mesos agent started on (724)@172.17.0.2:33744 I1017 21:37:55.200907 3535 slave.cpp:255] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://"; --appc_store_dir="/tmp/MesosContainerizer_DefaultE
[jira] [Commented] (MESOS-8112) DefaultExecutorTest.ResourceLimitation is flaky
[ https://issues.apache.org/jira/browse/MESOS-8112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16209913#comment-16209913 ] James Peach commented on MESOS-8112: {{ContentType/AgentAPITest.GetContainers/1}} might also be fallout from the same changes in MESOS-7963 > DefaultExecutorTest.ResourceLimitation is flaky > --- > > Key: MESOS-8112 > URL: https://issues.apache.org/jira/browse/MESOS-8112 > Project: Mesos > Issue Type: Bug > Components: flaky, test >Reporter: James Peach > > As seen in CI builds, the > {{MesosContainerizer/DefaultExecutorTest.ResourceLimitation/0}} test can be > flaky > {noformat}[ RUN ] > MesosContainerizer/DefaultExecutorTest.ResourceLimitation/0 > I1017 21:37:55.179539 3528 cluster.cpp:162] Creating default 'local' > authorizer > I1017 21:37:55.182804 3529 master.cpp:445] Master > 0a7cd77c-8bc0-4fdc-b6c5-918b7ffc392e (42cd332f4072) started on > 172.17.0.2:33744 > I1017 21:37:55.182847 3529 master.cpp:447] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/1FtpuJ/credentials" > --filter_gpu_resources="true" --framework_sorter="drf" --help="false" > --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-1.5.0/_inst/share/mesos/webui" > --work_dir="/tmp/1FtpuJ/master" --zk_session_timeout="10secs" > I1017 21:37:55.183141 3529 master.cpp:496] Master only allowing > authenticated frameworks to register > I1017 21:37:55.183153 3529 master.cpp:502] Master only allowing > authenticated agents to register > I1017 21:37:55.183161 3529 master.cpp:508] Master only allowing > authenticated HTTP frameworks to register > I1017 21:37:55.183167 3529 credentials.hpp:37] Loading credentials for > authentication from '/tmp/1FtpuJ/credentials' > I1017 21:37:55.183472 3529 master.cpp:552] Using default 'crammd5' > authenticator > I1017 21:37:55.183661 3529 http.cpp:1045] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I1017 21:37:55.183862 3529 http.cpp:1045] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I1017 21:37:55.184082 3529 http.cpp:1045] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I1017 21:37:55.184257 3529 master.cpp:631] Authorization enabled > I1017 21:37:55.184450 3536 hierarchical.cpp:171] Initialized hierarchical > allocator process > I1017 21:37:55.184551 3536 whitelist_watcher.cpp:77] No whitelist given > I1017 21:37:55.187489 3536 master.cpp:2198] Elected as the leading master! > I1017 21:37:55.187516 3536 master.cpp:1687] Recovering from registrar > I1017 21:37:55.187728 3536 registrar.cpp:347] Recovering registrar > I1017 21:37:55.188508 3536 registrar.cpp:391] Successfully fetched the > registry (0B) in 745984ns > I1017 21:37:55.188616 3536 registrar.cpp:495] Applied 1 operations in > 37290ns; attempting to update the registry > I1017 21:37:55.189162 3536 registrar.cpp:552] Successfully updated the > registry in 491008ns > I1017 21:37:55.189285 3536 registrar.cpp:424] Successfully recovered > registrar > I1017 21:37:55.190011 3531 hierarchical.cpp:209] Skipping recovery of > hierarchical allocator: nothing to recover > I1017 21:37:55.190115 3534 master.cpp:1791] Recovered 0 agents from the > registry (129B); allowing 10mins for agents to re-register > W1017 21:37:55.195062 3528 process.cpp:3194] Attempted to spawn already > running process files@172.17.0.2:33744 > I1017 21:37:55.195956 3528 containerizer.cpp:292] Using isolation { > environment_secret, network/cni, filesystem/posix, disk/du } > W1017 21:37:55.196488 3528 backend.cpp:76] Failed to create 'aufs' backend: > AufsBackend requires root privileges > W1017 21:37:55.196630 3528 backend.cpp:76] Failed to create
[jira] [Updated] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky
[ https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7742: --- Attachment: AgentAPITest.LaunchNestedContainerSession-badrun.txt > ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky > -- > > Key: MESOS-7742 > URL: https://issues.apache.org/jira/browse/MESOS-7742 > Project: Mesos > Issue Type: Bug >Reporter: Vinod Kone >Assignee: Gastón Kleiman > Labels: flaky-test, mesosphere-oncall > Attachments: AgentAPITest.LaunchNestedContainerSession-badrun.txt > > > Observed this on ASF CI. > [~gkleiman] mind triaging this? > {code} > [ RUN ] > ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0 > I0629 05:49:33.180673 25301 cluster.cpp:162] Creating default 'local' > authorizer > I0629 05:49:33.182234 25306 master.cpp:436] Master > 90ea1640-bdf3-49ba-b78f-b2ba7ea30077 (296af9b598c3) started on > 172.17.0.3:45726 > I0629 05:49:33.182289 25306 master.cpp:438] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" - > -allocator="HierarchicalDRF" --authenticate_agents="true" > --authenticate_frameworks="true" --authenticate_http_frameworks="true" > --authenticate_http_readonly="true" --au > thenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/a5h5J3/credentials" > --framework_sorter="drf" --help="false" --hostn > ame_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" > --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="10 > 00" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" > --registry="in_memory" --registry_fetch_timeout="1mins" > --registry_gc_interval="15mins" --registr > y_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" - > -version="false" --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/tmp/a5h5J3/master" --zk_session_timeout="10secs" > I0629 05:49:33.182561 25306 master.cpp:488] Master only allowing > authenticated frameworks to register > I0629 05:49:33.182610 25306 master.cpp:502] Master only allowing > authenticated agents to register > I0629 05:49:33.182636 25306 master.cpp:515] Master only allowing > authenticated HTTP frameworks to register > I0629 05:49:33.182656 25306 credentials.hpp:37] Loading credentials for > authentication from '/tmp/a5h5J3/credentials' > I0629 05:49:33.182915 25306 master.cpp:560] Using default 'crammd5' > authenticator > I0629 05:49:33.183009 25306 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0629 05:49:33.183151 25306 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0629 05:49:33.183218 25306 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0629 05:49:33.183284 25306 master.cpp:640] Authorization enabled > I0629 05:49:33.183462 25309 hierarchical.cpp:158] Initialized hierarchical > allocator process > I0629 05:49:33.183504 25309 whitelist_watcher.cpp:77] No whitelist given > I0629 05:49:33.184311 25308 master.cpp:2161] Elected as the leading master! > I0629 05:49:33.184341 25308 master.cpp:1700] Recovering from registrar > I0629 05:49:33.184404 25308 registrar.cpp:345] Recovering registrar > I0629 05:49:33.184622 25308 registrar.cpp:389] Successfully fetched the > registry (0B) in 183040ns > I0629 05:49:33.184687 25308 registrar.cpp:493] Applied 1 operations in > 6441ns; attempting to update the registry > I0629 05:49:33.184885 25304 registrar.cpp:550] Successfully updated the > registry in 147200ns > I0629 05:49:33.184993 25304 registrar.cpp:422] Successfully recovered > registrar > I0629 05:49:33.185148 25308 master.cpp:1799] Recovered 0 agents from the > registry (129B); allowing 10mins for agents to re-register > I0629 05:49:33.185161 25302 hierarchical.cpp:185] Skipping recovery of > hierarchical allocator: nothing to recover > I0629 05:49:33.186769 25301 containerizer.cpp:221] Using isolation: > posix/cpu,posix/mem,filesystem/posix,network/cni > W0629 05:49:33.187232 25301 backend.cpp:76] Failed to create 'aufs' backend: > AufsBackend requires root privileges > W0629 05:49:33.187363 25301 backend.cpp:76] Failed to create 'bind' backend: > BindBackend requires root privileges > I0629 05:49:33.187396 25301 provisione
[jira] [Commented] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky
[ https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16210083#comment-16210083 ] Alexander Rukletsov commented on MESOS-7742: Observed a similar failure for a different test: {{AgentAPITest.LaunchNestedContainerSession}}. Log attached. > ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky > -- > > Key: MESOS-7742 > URL: https://issues.apache.org/jira/browse/MESOS-7742 > Project: Mesos > Issue Type: Bug >Reporter: Vinod Kone >Assignee: Gastón Kleiman > Labels: flaky-test, mesosphere-oncall > Attachments: AgentAPITest.LaunchNestedContainerSession-badrun.txt > > > Observed this on ASF CI. > [~gkleiman] mind triaging this? > {code} > [ RUN ] > ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0 > I0629 05:49:33.180673 25301 cluster.cpp:162] Creating default 'local' > authorizer > I0629 05:49:33.182234 25306 master.cpp:436] Master > 90ea1640-bdf3-49ba-b78f-b2ba7ea30077 (296af9b598c3) started on > 172.17.0.3:45726 > I0629 05:49:33.182289 25306 master.cpp:438] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" - > -allocator="HierarchicalDRF" --authenticate_agents="true" > --authenticate_frameworks="true" --authenticate_http_frameworks="true" > --authenticate_http_readonly="true" --au > thenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/a5h5J3/credentials" > --framework_sorter="drf" --help="false" --hostn > ame_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" > --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="10 > 00" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" > --registry="in_memory" --registry_fetch_timeout="1mins" > --registry_gc_interval="15mins" --registr > y_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" - > -version="false" --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/tmp/a5h5J3/master" --zk_session_timeout="10secs" > I0629 05:49:33.182561 25306 master.cpp:488] Master only allowing > authenticated frameworks to register > I0629 05:49:33.182610 25306 master.cpp:502] Master only allowing > authenticated agents to register > I0629 05:49:33.182636 25306 master.cpp:515] Master only allowing > authenticated HTTP frameworks to register > I0629 05:49:33.182656 25306 credentials.hpp:37] Loading credentials for > authentication from '/tmp/a5h5J3/credentials' > I0629 05:49:33.182915 25306 master.cpp:560] Using default 'crammd5' > authenticator > I0629 05:49:33.183009 25306 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0629 05:49:33.183151 25306 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0629 05:49:33.183218 25306 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0629 05:49:33.183284 25306 master.cpp:640] Authorization enabled > I0629 05:49:33.183462 25309 hierarchical.cpp:158] Initialized hierarchical > allocator process > I0629 05:49:33.183504 25309 whitelist_watcher.cpp:77] No whitelist given > I0629 05:49:33.184311 25308 master.cpp:2161] Elected as the leading master! > I0629 05:49:33.184341 25308 master.cpp:1700] Recovering from registrar > I0629 05:49:33.184404 25308 registrar.cpp:345] Recovering registrar > I0629 05:49:33.184622 25308 registrar.cpp:389] Successfully fetched the > registry (0B) in 183040ns > I0629 05:49:33.184687 25308 registrar.cpp:493] Applied 1 operations in > 6441ns; attempting to update the registry > I0629 05:49:33.184885 25304 registrar.cpp:550] Successfully updated the > registry in 147200ns > I0629 05:49:33.184993 25304 registrar.cpp:422] Successfully recovered > registrar > I0629 05:49:33.185148 25308 master.cpp:1799] Recovered 0 agents from the > registry (129B); allowing 10mins for agents to re-register > I0629 05:49:33.185161 25302 hierarchical.cpp:185] Skipping recovery of > hierarchical allocator: nothing to recover > I0629 05:49:33.186769 25301 containerizer.cpp:221] Using isolation: > posix/cpu,posix/mem,filesystem/posix,network/cni > W0629 05:49:33.187232 25301 backend.cpp:76] Failed to create 'aufs' backend: > AufsBackend requires root privileges > W0629 05:49:33.187363 25301 backend.cpp:76] Failed to cre
[jira] [Updated] (MESOS-8112) DefaultExecutorTest.ResourceLimitation is flaky
[ https://issues.apache.org/jira/browse/MESOS-8112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-8112: --- Attachment: GetContainers-goodrun.txt GetContainers-badrun.txt > DefaultExecutorTest.ResourceLimitation is flaky > --- > > Key: MESOS-8112 > URL: https://issues.apache.org/jira/browse/MESOS-8112 > Project: Mesos > Issue Type: Bug > Components: flaky, test >Reporter: James Peach > Attachments: GetContainers-badrun.txt, GetContainers-goodrun.txt > > > As seen in CI builds, the > {{MesosContainerizer/DefaultExecutorTest.ResourceLimitation/0}} test can be > flaky > {noformat}[ RUN ] > MesosContainerizer/DefaultExecutorTest.ResourceLimitation/0 > I1017 21:37:55.179539 3528 cluster.cpp:162] Creating default 'local' > authorizer > I1017 21:37:55.182804 3529 master.cpp:445] Master > 0a7cd77c-8bc0-4fdc-b6c5-918b7ffc392e (42cd332f4072) started on > 172.17.0.2:33744 > I1017 21:37:55.182847 3529 master.cpp:447] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/1FtpuJ/credentials" > --filter_gpu_resources="true" --framework_sorter="drf" --help="false" > --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-1.5.0/_inst/share/mesos/webui" > --work_dir="/tmp/1FtpuJ/master" --zk_session_timeout="10secs" > I1017 21:37:55.183141 3529 master.cpp:496] Master only allowing > authenticated frameworks to register > I1017 21:37:55.183153 3529 master.cpp:502] Master only allowing > authenticated agents to register > I1017 21:37:55.183161 3529 master.cpp:508] Master only allowing > authenticated HTTP frameworks to register > I1017 21:37:55.183167 3529 credentials.hpp:37] Loading credentials for > authentication from '/tmp/1FtpuJ/credentials' > I1017 21:37:55.183472 3529 master.cpp:552] Using default 'crammd5' > authenticator > I1017 21:37:55.183661 3529 http.cpp:1045] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I1017 21:37:55.183862 3529 http.cpp:1045] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I1017 21:37:55.184082 3529 http.cpp:1045] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I1017 21:37:55.184257 3529 master.cpp:631] Authorization enabled > I1017 21:37:55.184450 3536 hierarchical.cpp:171] Initialized hierarchical > allocator process > I1017 21:37:55.184551 3536 whitelist_watcher.cpp:77] No whitelist given > I1017 21:37:55.187489 3536 master.cpp:2198] Elected as the leading master! > I1017 21:37:55.187516 3536 master.cpp:1687] Recovering from registrar > I1017 21:37:55.187728 3536 registrar.cpp:347] Recovering registrar > I1017 21:37:55.188508 3536 registrar.cpp:391] Successfully fetched the > registry (0B) in 745984ns > I1017 21:37:55.188616 3536 registrar.cpp:495] Applied 1 operations in > 37290ns; attempting to update the registry > I1017 21:37:55.189162 3536 registrar.cpp:552] Successfully updated the > registry in 491008ns > I1017 21:37:55.189285 3536 registrar.cpp:424] Successfully recovered > registrar > I1017 21:37:55.190011 3531 hierarchical.cpp:209] Skipping recovery of > hierarchical allocator: nothing to recover > I1017 21:37:55.190115 3534 master.cpp:1791] Recovered 0 agents from the > registry (129B); allowing 10mins for agents to re-register > W1017 21:37:55.195062 3528 process.cpp:3194] Attempted to spawn already > running process files@172.17.0.2:33744 > I1017 21:37:55.195956 3528 containerizer.cpp:292] Using isolation { > environment_secret, network/cni, filesystem/posix, disk/du } > W1017 21:37:55.196488 3528 backend.cpp:76] Failed to create 'aufs' backend: > AufsBackend requires root privileges > W1017 21:37:55.196630 3528 backend.cpp:76] Fa
[jira] [Updated] (MESOS-7726) MasterTest.IgnoreOldAgentReregistration test is flaky
[ https://issues.apache.org/jira/browse/MESOS-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7726: --- Attachment: IgnoreOldAgentReregistration-badrun.txt IgnoreOldAgentReregistration-goodrun.txt > MasterTest.IgnoreOldAgentReregistration test is flaky > - > > Key: MESOS-7726 > URL: https://issues.apache.org/jira/browse/MESOS-7726 > Project: Mesos > Issue Type: Bug >Reporter: Vinod Kone >Assignee: Neil Conway > Labels: flaky-test, mesosphere-oncall > Attachments: IgnoreOldAgentReregistration-badrun.txt, > IgnoreOldAgentReregistration-goodrun.txt > > > Observed this on ASF CI. > {code} > [ RUN ] MasterTest.IgnoreOldAgentReregistration > I0627 05:23:06.031154 4917 cluster.cpp:162] Creating default 'local' > authorizer > I0627 05:23:06.033433 4945 master.cpp:438] Master > a8778782-0da1-49a5-9cb8-9f6d11701733 (c43debbe7e32) started on > 172.17.0.4:41747 > I0627 05:23:06.033457 4945 master.cpp:440] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/2BARnF/credentials" > --filter_gpu_resources="true" --framework_sorter="drf" --help="false" > --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-1.4.0/_inst/share/mesos/webui" > --work_dir="/tmp/2BARnF/master" --zk_session_timeout="10secs" > I0627 05:23:06.033771 4945 master.cpp:490] Master only allowing > authenticated frameworks to register > I0627 05:23:06.033787 4945 master.cpp:504] Master only allowing > authenticated agents to register > I0627 05:23:06.033798 4945 master.cpp:517] Master only allowing > authenticated HTTP frameworks to register > I0627 05:23:06.033812 4945 credentials.hpp:37] Loading credentials for > authentication from '/tmp/2BARnF/credentials' > I0627 05:23:06.034080 4945 master.cpp:562] Using default 'crammd5' > authenticator > I0627 05:23:06.034221 4945 http.cpp:974] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0627 05:23:06.034409 4945 http.cpp:974] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0627 05:23:06.034569 4945 http.cpp:974] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0627 05:23:06.034688 4945 master.cpp:642] Authorization enabled > I0627 05:23:06.034862 4938 whitelist_watcher.cpp:77] No whitelist given > I0627 05:23:06.034868 4950 hierarchical.cpp:169] Initialized hierarchical > allocator process > I0627 05:23:06.037211 4957 master.cpp:2161] Elected as the leading master! > I0627 05:23:06.037236 4957 master.cpp:1700] Recovering from registrar > I0627 05:23:06.037333 4938 registrar.cpp:345] Recovering registrar > I0627 05:23:06.038146 4938 registrar.cpp:389] Successfully fetched the > registry (0B) in 768256ns > I0627 05:23:06.038290 4938 registrar.cpp:493] Applied 1 operations in > 30798ns; attempting to update the registry > I0627 05:23:06.038861 4938 registrar.cpp:550] Successfully updated the > registry in 510976ns > I0627 05:23:06.038960 4938 registrar.cpp:422] Successfully recovered > registrar > I0627 05:23:06.039364 4941 hierarchical.cpp:207] Skipping recovery of > hierarchical allocator: nothing to recover > I0627 05:23:06.039594 4958 master.cpp:1799] Recovered 0 agents from the > registry (129B); allowing 10mins for agents to re-register > I0627 05:23:06.043999 4917 containerizer.cpp:230] Using isolation: > posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret > W0627 05:23:06.044456 4917 backend.cpp:76] Failed to create 'aufs' backend: > AufsBackend requires root privileges > W0627 05:23:06.044548 4917 backend.cpp:76] Failed to create 'bind' backend: > BindBackend requires root privileges > I0627 05:23:06.044580 4917
[jira] [Created] (MESOS-8113) Display task names in Alphanum pattern
Varun Gupta created MESOS-8113: -- Summary: Display task names in Alphanum pattern Key: MESOS-8113 URL: https://issues.apache.org/jira/browse/MESOS-8113 Project: Mesos Issue Type: Task Components: webui Affects Versions: 1.4.0 Reporter: Varun Gupta Priority: Minor Fix For: 1.4.0 Attachments: current_lexicographic.png, proposed_alphanum.png As of now, task names are sorted in Lexicographic order, and it annoys to view them. So, I propose to sort them in Alphanum pattern. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8113) Display task names in Alphanum pattern
[ https://issues.apache.org/jira/browse/MESOS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Gupta updated MESOS-8113: --- Attachment: current_lexicographic.png proposed_alphanum.png > Display task names in Alphanum pattern > -- > > Key: MESOS-8113 > URL: https://issues.apache.org/jira/browse/MESOS-8113 > Project: Mesos > Issue Type: Task > Components: webui >Affects Versions: 1.4.0 >Reporter: Varun Gupta >Priority: Minor > Fix For: 1.4.0 > > Attachments: current_lexicographic.png, proposed_alphanum.png > > > As of now, task names are sorted in Lexicographic order, and it annoys to > view them. So, I propose to sort them in Alphanum pattern. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7111) HttpFaultToleranceTest.SchedulerFailoverFrameworkToExecutorMessage segfaults
[ https://issues.apache.org/jira/browse/MESOS-7111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7111: --- Labels: flaky-test mesosphere (was: mesosphere) > HttpFaultToleranceTest.SchedulerFailoverFrameworkToExecutorMessage segfaults > > > Key: MESOS-7111 > URL: https://issues.apache.org/jira/browse/MESOS-7111 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 1.2.0 > Environment: ubuntu-16 >Reporter: Benjamin Bannier > Labels: flaky-test, mesosphere > > We observed a segfault in > {{HttpFaultToleranceTest.SchedulerFailoverFrameworkToExecutorMessage}} in > internal CI on an ubuntu16 machine. Note that ubuntu16 uses gcc-6. > {code} > [ RUN ] > HttpFaultToleranceTest.SchedulerFailoverFrameworkToExecutorMessage > I0210 02:47:31.260174 19578 cluster.cpp:160] Creating default 'local' > authorizer > I0210 02:47:31.261225 19597 master.cpp:383] Master > d8129420-2a04-48e7-9b28-6b0a0af73168 (ip-10-150-111-24.ec2.internal) started > on 10.150.111.24:33608 > I0210 02:47:31.261281 19597 master.cpp:385] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="false" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/fBrqHi/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/fBrqHi/master" > --zk_session_timeout="10secs" > I0210 02:47:31.261404 19597 master.cpp:437] Master allowing unauthenticated > frameworks to register > I0210 02:47:31.261411 19597 master.cpp:449] Master only allowing > authenticated agents to register > I0210 02:47:31.261415 19597 master.cpp:462] Master only allowing > authenticated HTTP frameworks to register > I0210 02:47:31.261420 19597 credentials.hpp:37] Loading credentials for > authentication from '/tmp/fBrqHi/credentials' > I0210 02:47:31.261488 19597 master.cpp:507] Using default 'crammd5' > authenticator > I0210 02:47:31.261530 19597 http.cpp:919] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0210 02:47:31.261591 19597 http.cpp:919] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0210 02:47:31.261631 19597 http.cpp:919] Using default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0210 02:47:31.261698 19597 master.cpp:587] Authorization enabled > I0210 02:47:31.261754 19601 whitelist_watcher.cpp:77] No whitelist given > I0210 02:47:31.261754 19602 hierarchical.cpp:161] Initialized hierarchical > allocator process > I0210 02:47:31.262462 19597 master.cpp:2124] Elected as the leading master! > I0210 02:47:31.262482 19597 master.cpp:1646] Recovering from registrar > I0210 02:47:31.262545 19603 registrar.cpp:329] Recovering registrar > I0210 02:47:31.262774 19602 registrar.cpp:362] Successfully fetched the > registry (0B) in 201984ns > I0210 02:47:31.262809 19602 registrar.cpp:461] Applied 1 operations in > 2963ns; attempting to update the registry > I0210 02:47:31.263062 19599 registrar.cpp:506] Successfully updated the > registry in 214016ns > I0210 02:47:31.263119 19599 registrar.cpp:392] Successfully recovered > registrar > I0210 02:47:31.263267 19597 master.cpp:1762] Recovered 0 agents from the > registry (172B); allowing 10mins for agents to re-register > I0210 02:47:31.263295 19598 hierarchical.cpp:188] Skipping recovery of > hierarchical allocator: nothing to recover > I0210 02:47:31.264645 19578 cluster.cpp:446] Creating default 'local' > authorizer > I0210 02:47:31.265029 19598 slave.cpp:211] Mesos agent started on > (105)@10.150.111.24:33608 > I0210 02:47:31.265187 19578 scheduler.cpp:184] Version: 1.3.0 > I0210 02:47:31.265043 19598 slave.cpp:212] Flags at startup: --acls="" > --appc_simple_discovery_uri_pr
[jira] [Commented] (MESOS-8112) DefaultExecutorTest.ResourceLimitation is flaky
[ https://issues.apache.org/jira/browse/MESOS-8112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16210353#comment-16210353 ] James Peach commented on MESOS-8112: The first failure in this test run is {{../../src/tests/default_executor_tests.cpp:1460: Failed to wait 15secs for failed}}. This indicates that we got the {{starting}} and {{running}} status updates but the final {{failure}} took too long. However, the master forwarded the {{RUNNING}} update here: {noformat} I1017 21:37:55.499879 3532 master.cpp:7055] Forwarding status update TASK_RUNNING (UUID: 86bb612c-9d48-4e85-a0f1-89820ea65fa1) for task ffc4604c-a6cb-4ced-a969-fc6b9e6f955d of framework 0a7cd77c-8bc0-4fdc-b6c5-918b7ffc392e- ... 1017 21:37:55.742033 3533 containerizer.cpp:2677] Container 4789abbb-04c9-4d6d-b561-f44b34ec47d2 has reached its limit for resource [{"allocation_info":{"role":"*"},"name":"disk","scalar":{"value":20.0},"type":"SCALAR"}] and will be terminated ... I1017 21:39:22.944377 3535 master.cpp:1417] Framework 0a7cd77c-8bc0-4fdc-b6c5-918b7ffc392e- (default) disconnected ... I1017 21:39:22.946893 3529 master.cpp:9157] Updating the state of task ffc4604c-a6cb-4ced-a969-fc6b9e6f955d of framework 0a7cd77c-8bc0-4fdc-b6c5-918b7ffc392e- (latest state: TASK_KILLED, status update state: TASK_KILLED) {noformat} So from the master's perspective, the test framework disconnected? Or did this happen once the test failed and we started tearing it down? Later in the test log: {noformat} ../../src/tests/default_executor_tests.cpp:1427: Failure Actual function call count doesn't match EXPECT_CALL(*scheduler, update(_, _))... Expected: to be called twice Actual: called once - unsatisfied and active {noformat} This seems to indicate that we only got 1 of 3 expected status updates, but if that was true I would expect to see a failure on {{AWAIT_READY(running)}} and I can't find that here :( > DefaultExecutorTest.ResourceLimitation is flaky > --- > > Key: MESOS-8112 > URL: https://issues.apache.org/jira/browse/MESOS-8112 > Project: Mesos > Issue Type: Bug > Components: flaky, test >Reporter: James Peach > Attachments: GetContainers-badrun.txt, GetContainers-goodrun.txt > > > As seen in CI builds, the > {{MesosContainerizer/DefaultExecutorTest.ResourceLimitation/0}} test can be > flaky > {noformat}[ RUN ] > MesosContainerizer/DefaultExecutorTest.ResourceLimitation/0 > I1017 21:37:55.179539 3528 cluster.cpp:162] Creating default 'local' > authorizer > I1017 21:37:55.182804 3529 master.cpp:445] Master > 0a7cd77c-8bc0-4fdc-b6c5-918b7ffc392e (42cd332f4072) started on > 172.17.0.2:33744 > I1017 21:37:55.182847 3529 master.cpp:447] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/1FtpuJ/credentials" > --filter_gpu_resources="true" --framework_sorter="drf" --help="false" > --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-1.5.0/_inst/share/mesos/webui" > --work_dir="/tmp/1FtpuJ/master" --zk_session_timeout="10secs" > I1017 21:37:55.183141 3529 master.cpp:496] Master only allowing > authenticated frameworks to register > I1017 21:37:55.183153 3529 master.cpp:502] Master only allowing > authenticated agents to register > I1017 21:37:55.183161 3529 master.cpp:508] Master only allowing > authenticated HTTP frameworks to register > I1017 21:37:55.183167 3529 credentials.hpp:37] Loading credentials for > authentication from '/tmp/1FtpuJ/credentials' > I1017 21:37:55.183472 3529 master.cpp:552] Using default 'crammd5' > authenticator > I1017 21:37:55.183661 3529 http.cpp:1045] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I1017 21:37:55.183862 3529 http.cpp:1045] Creating default
[jira] [Updated] (MESOS-8112) DefaultExecutorTest.ResourceLimitation is flaky
[ https://issues.apache.org/jira/browse/MESOS-8112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Peach updated MESOS-8112: --- Attachment: ResourceLimitation-badrun.txt > DefaultExecutorTest.ResourceLimitation is flaky > --- > > Key: MESOS-8112 > URL: https://issues.apache.org/jira/browse/MESOS-8112 > Project: Mesos > Issue Type: Bug > Components: flaky, test >Reporter: James Peach > Attachments: GetContainers-badrun.txt, GetContainers-goodrun.txt, > ResourceLimitation-badrun.txt > > > As seen in CI builds, the > {{MesosContainerizer/DefaultExecutorTest.ResourceLimitation/0}} test can be > flaky > {noformat}[ RUN ] > MesosContainerizer/DefaultExecutorTest.ResourceLimitation/0 > I1017 21:37:55.179539 3528 cluster.cpp:162] Creating default 'local' > authorizer > I1017 21:37:55.182804 3529 master.cpp:445] Master > 0a7cd77c-8bc0-4fdc-b6c5-918b7ffc392e (42cd332f4072) started on > 172.17.0.2:33744 > I1017 21:37:55.182847 3529 master.cpp:447] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/1FtpuJ/credentials" > --filter_gpu_resources="true" --framework_sorter="drf" --help="false" > --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-1.5.0/_inst/share/mesos/webui" > --work_dir="/tmp/1FtpuJ/master" --zk_session_timeout="10secs" > I1017 21:37:55.183141 3529 master.cpp:496] Master only allowing > authenticated frameworks to register > I1017 21:37:55.183153 3529 master.cpp:502] Master only allowing > authenticated agents to register > I1017 21:37:55.183161 3529 master.cpp:508] Master only allowing > authenticated HTTP frameworks to register > I1017 21:37:55.183167 3529 credentials.hpp:37] Loading credentials for > authentication from '/tmp/1FtpuJ/credentials' > I1017 21:37:55.183472 3529 master.cpp:552] Using default 'crammd5' > authenticator > I1017 21:37:55.183661 3529 http.cpp:1045] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I1017 21:37:55.183862 3529 http.cpp:1045] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I1017 21:37:55.184082 3529 http.cpp:1045] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I1017 21:37:55.184257 3529 master.cpp:631] Authorization enabled > I1017 21:37:55.184450 3536 hierarchical.cpp:171] Initialized hierarchical > allocator process > I1017 21:37:55.184551 3536 whitelist_watcher.cpp:77] No whitelist given > I1017 21:37:55.187489 3536 master.cpp:2198] Elected as the leading master! > I1017 21:37:55.187516 3536 master.cpp:1687] Recovering from registrar > I1017 21:37:55.187728 3536 registrar.cpp:347] Recovering registrar > I1017 21:37:55.188508 3536 registrar.cpp:391] Successfully fetched the > registry (0B) in 745984ns > I1017 21:37:55.188616 3536 registrar.cpp:495] Applied 1 operations in > 37290ns; attempting to update the registry > I1017 21:37:55.189162 3536 registrar.cpp:552] Successfully updated the > registry in 491008ns > I1017 21:37:55.189285 3536 registrar.cpp:424] Successfully recovered > registrar > I1017 21:37:55.190011 3531 hierarchical.cpp:209] Skipping recovery of > hierarchical allocator: nothing to recover > I1017 21:37:55.190115 3534 master.cpp:1791] Recovered 0 agents from the > registry (129B); allowing 10mins for agents to re-register > W1017 21:37:55.195062 3528 process.cpp:3194] Attempted to spawn already > running process files@172.17.0.2:33744 > I1017 21:37:55.195956 3528 containerizer.cpp:292] Using isolation { > environment_secret, network/cni, filesystem/posix, disk/du } > W1017 21:37:55.196488 3528 backend.cpp:76] Failed to create 'aufs' backend: > AufsBackend requires root privileges > W1017 21:37:55.196630 3528 backend.cpp:76] Failed to create 'bin