[jira] [Created] (MESOS-10067) Update the `update()` method of cgroups subsystem interface to handle container resource limits
Qian Zhang created MESOS-10067: -- Summary: Update the `update()` method of cgroups subsystem interface to handle container resource limits Key: MESOS-10067 URL: https://issues.apache.org/jira/browse/MESOS-10067 Project: Mesos Issue Type: Task Components: containerization Reporter: Qian Zhang Assignee: Qian Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (MESOS-10047) Update the CPU subsystem in the cgroup isolator to set container's CPU resource limits
[ https://issues.apache.org/jira/browse/MESOS-10047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Zhang reassigned MESOS-10047: -- Assignee: Qian Zhang > Update the CPU subsystem in the cgroup isolator to set container's CPU > resource limits > -- > > Key: MESOS-10047 > URL: https://issues.apache.org/jira/browse/MESOS-10047 > Project: Mesos > Issue Type: Task >Reporter: Qian Zhang >Assignee: Qian Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (MESOS-10066) mesos-docker-excutor process dies when agent stops. Recovery fails when agent returns
Dalton Matos Coelho Barreto created MESOS-10066: --- Summary: mesos-docker-excutor process dies when agent stops. Recovery fails when agent returns Key: MESOS-10066 URL: https://issues.apache.org/jira/browse/MESOS-10066 Project: Mesos Issue Type: Bug Components: agent, containerization, docker, executor Affects Versions: 1.7.3 Reporter: Dalton Matos Coelho Barreto Hello all, The documentation about Agent Recovery shows two conditions for the recovery to be possible: - The agent must have recovery enabled (default true?); - The scheduler must register itself saying that it has checkpointing enabled. In my tests I'm using Marathon as the scheduler and Mesos itself sees Marathon as e checkpoint-enabled scheduler: {noformat} $ curl -sL 10.234.172.27:5050/state | jq '.frameworks[] | {"name": .name, "id": .id, "checkpoint": .checkpoint, "active": .active}' { "name": "asgard-chronos", "id": "4783cf15-4fb1-4c75-90fe-44eeec5258a7-0001", "checkpoint": true, "active": true } { "name": "marathon", "id": "4783cf15-4fb1-4c75-90fe-44eeec5258a7-", "checkpoint": true, "active": true } }} {noformat} Here is what I'm using: # Mesos Master, 1.4.1 # Mesos Agent 1.7.3 # Using docker image {{mesos/mesos-centos:1.7.x}} # Docker sock mounted from the host # Docker binary also mounted from the host # Marathon: 1.4.12 # Docker {noformat} Client: Docker Engine - Community Version: 19.03.5 API version: 1.39 (downgraded from 1.40) Go version:go1.12.12 Git commit:633a0ea838 Built: Wed Nov 13 07:22:05 2019 OS/Arch: linux/amd64 Experimental: false Server: Docker Engine - Community Engine: Version: 18.09.2 API version: 1.39 (minimum version 1.12) Go version: go1.10.6 Git commit: 6247962 Built:Sun Feb 10 03:42:13 2019 OS/Arch: linux/amd64 Experimental: false {noformat} h2. The problem Here is the Marathon test app, a simple {{sleep 99d}} based on {{debian}} docker image. {noformat} { "id": "/sleep", "cmd": "sleep 99d", "cpus": 0.1, "mem": 128, "disk": 0, "instances": 1, "constraints": [], "acceptedResourceRoles": [ "*" ], "container": { "type": "DOCKER", "volumes": [], "docker": { "image": "debian", "network": "HOST", "privileged": false, "parameters": [], "forcePullImage": true } }, "labels": {}, "portDefinitions": [] } {noformat} This task runs fine and get scheduled on the right agent, which is running mesos agent 1.7.3 (using the docker image, {{mesos/mesos-centos:1.7.x}}). Here is a sample log: {noformat} mesos-slave_1 | I1205 13:24:21.391464 19849 slave.cpp:2403] Authorizing task 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' for framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7- mesos-slave_1 | I1205 13:24:21.392707 19849 slave.cpp:2846] Launching task 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' for framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7- mesos-slave_1 | I1205 13:24:21.392895 19849 paths.cpp:748] Creating sandbox '/var/lib/mesos/agent/slaves/79ad3a13-b567-4273-ac8c-30378d35a439-S60499/frameworks/4783cf15-4fb1-4c75-90fe-44eeec5258a7-/executors/sleep.8c187c41-1762-11ea-a2e5-02429217540f/runs/53ec0ef3-3290-476a-b2b6-385099e9b923' mesos-slave_1 | I1205 13:24:21.394399 19849 paths.cpp:748] Creating sandbox '/var/lib/mesos/agent/meta/slaves/79ad3a13-b567-4273-ac8c-30378d35a439-S60499/frameworks/4783cf15-4fb1-4c75-90fe-44eeec5258a7-/executors/sleep.8c187c41-1762-11ea-a2e5-02429217540f/runs/53ec0ef3-3290-476a-b2b6-385099e9b923' mesos-slave_1 | I1205 13:24:21.394918 19849 slave.cpp:9068] Launching executor 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7- with resources [{"allocation_info":{"role":"*"},"name":"cpus","scalar":{"value":0.1},"type":"SCALAR"},{"allocation_info":{"role":"*"},"name":"mem","scalar":{"value":32.0},"type":"SCALAR"}] in work directory '/var/lib/mesos/agent/slaves/79ad3a13-b567-4273-ac8c-30378d35a439-S60499/frameworks/4783cf15-4fb1-4c75-90fe-44eeec5258a7-/executors/sleep.8c187c41-1762-11ea-a2e5-02429217540f/runs/53ec0ef3-3290-476a-b2b6-385099e9b923' mesos-slave_1 | I1205 13:24:21.396499 19849 slave.cpp:3078] Queued task 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' for executor 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7- mesos-slave_1 | I1205 13:24:21.397038 19849 slave.cpp:3526] Launching container 53ec0ef3-3290-476a-b2b6-385099e9b92
[jira] [Commented] (MESOS-9968) WWWAuthenticate header parsing fails when commas are in (quoted) realm
[ https://issues.apache.org/jira/browse/MESOS-9968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988921#comment-16988921 ] Benno Evers commented on MESOS-9968: 1.8.x: {noformat} commit 21ec06ed44c1fbd2272081b20bdcee630759f52d Author: Benjamin Bannier Date: Mon Sep 23 10:24:50 2019 +0200 Fixed parsing of HTTP authentication headers. This patch adds support for quoted strings in `Www-Authenticate` headers and allows to use spaces when delimiting authentication attributes of the form `WWW-Autenticate: a=b, c=d`, both of with are allowed by RFC2617. Review: https://reviews.apache.org/r/71534 commit 32d6937bee6c2c43d769daa7b95b33856b9b8364 Author: Benjamin Bannier Date: Mon Sep 23 10:23:27 2019 +0200 Cleaned up `HTTPTest.WWWAuthenticateHeader`. This patch removes a number of error-prone temporaries previously reused in the test. Review: https://reviews.apache.org/r/71533 {noformat} 1.9.x {noformat} commit 5fa73f683c38c025b0e650de24474e0fdf95d1f4 Author: Benjamin Bannier Date: Mon Sep 23 10:24:50 2019 +0200 Fixed parsing of HTTP authentication headers. This patch adds support for quoted strings in `Www-Authenticate` headers and allows to use spaces when delimiting authentication attributes of the form `WWW-Autenticate: a=b, c=d`, both of with are allowed by RFC2617. Review: https://reviews.apache.org/r/71534 commit 5f6d218a3123ec35b3a14ce20e72b5ca3594cef2 Author: Benjamin Bannier Date: Mon Sep 23 10:23:27 2019 +0200 Cleaned up `HTTPTest.WWWAuthenticateHeader`. This patch removes a number of error-prone temporaries previously reused in the test. Review: https://reviews.apache.org/r/71533 {noformat} > WWWAuthenticate header parsing fails when commas are in (quoted) realm > -- > > Key: MESOS-9968 > URL: https://issues.apache.org/jira/browse/MESOS-9968 > Project: Mesos > Issue Type: Bug > Components: HTTP API, libprocess >Reporter: Jan Schlicht >Assignee: Benjamin Bannier >Priority: Major > Fix For: 1.10.0 > > > This was discovered when trying to launch the > {{[nvcr.io/nvidia/tensorflow:19.08-py3|http://nvcr.io/nvidia/tensorflow:19.08-py3]}} > image using the Mesos containerizer. This launch fails with > {noformat} > Failed to launch container: Failed to get WWW-Authenticate header: Unexpected > auth-param format: > 'realm="https://nvcr.io/proxy_auth?scope=repository:nvidia/tensorflow:pull' > in > 'realm="https://nvcr.io/proxy_auth?scope=repository:nvidia/tensorflow:pull,push";' > {noformat} > This is because the [header tokenization in > libprocess|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/http.cpp#L640] > can't handle commas in quoted realm values. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10062) Implement relative path computation for stout
[ https://issues.apache.org/jira/browse/MESOS-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988766#comment-16988766 ] Benjamin Bannier commented on MESOS-10062: -- Reviews: https://reviews.apache.org/r/71878/ https://reviews.apache.org/r/71879/ https://reviews.apache.org/r/71880/ https://reviews.apache.org/r/71881/ https://reviews.apache.org/r/71882/ > Implement relative path computation for stout > - > > Key: MESOS-10062 > URL: https://issues.apache.org/jira/browse/MESOS-10062 > Project: Mesos > Issue Type: Task >Reporter: Benno Evers >Assignee: Benjamin Bannier >Priority: Major > > When using executor domain sockets, we might need to specify relative paths > in order to stay below the path length limit of 108 characters. > To do so, we should implement a `path::relative_path()` function in stout > that can compute the relative path between two directories. -- This message was sent by Atlassian Jira (v8.3.4#803005)