[jira] [Created] (MESOS-10168) Add secrets support to the CSI service and volume managers
Greg Mann created MESOS-10168: - Summary: Add secrets support to the CSI service and volume managers Key: MESOS-10168 URL: https://issues.apache.org/jira/browse/MESOS-10168 Project: Mesos Issue Type: Task Reporter: Greg Mann We must update our CSI code to pass secrets to CSI drivers when staging/unstaging and publishing/unpublishing volumes. We must ensure that we avoid writing any secrets to disk by holding a secret resolver in the appropriate component to resolve secrets associated with already-attached volumes during/after recovery. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (MESOS-10167) Mesos-websitebot fails due to wrong permissions of voulmes mounted into Docker container
[ https://issues.apache.org/jira/browse/MESOS-10167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166527#comment-17166527 ] Andrei Sekretenko edited comment on MESOS-10167 at 7/28/20, 4:27 PM: - Note that wrong permissions not allowing autom4te to create the cache directory *do* cause the misleading "no such file or directory" error message. See http://git.savannah.gnu.org/cgit/autoconf.git/tree/bin/autom4te.in#n998 was (Author: asekretenko): Note that wrong permissions not allowing autom4te to create the cache directory *do* cause the misleading "no such file or directory" error. See http://git.savannah.gnu.org/cgit/autoconf.git/tree/bin/autom4te.in#n998 > Mesos-websitebot fails due to wrong permissions of voulmes mounted into > Docker container > > > Key: MESOS-10167 > URL: https://issues.apache.org/jira/browse/MESOS-10167 > Project: Mesos > Issue Type: Bug > Components: project website >Reporter: Andrei Sekretenko >Priority: Minor > > Last successful run was on Apr 7: > https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2464/ > First failure: > https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2465/console > Build with added permissions dump > https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2525/console > shows that while the build scripts in the container are, as expected, running > under "tempuser" (with the same uid as the user outside container which pulls > the git repositories), > the directories with git repositories mounted into the container are owned by > root: > {noformat} > 19:06:21 uid=910(tempuser) gid=1001(tempuser) groups=1001(tempuser) > 19:06:21 total 836 > 19:06:21 drwxr-xr-x 12 root root 4096 Jul 3 17:02 . > 19:06:21 drwxr-xr-x 1 root root 4096 Jul 3 17:04 .. > 19:06:21 drwxr-xr-x 6 root root 4096 Jun 29 14:12 3rdparty > 19:06:21 drwxr-xr-x 2 root root 4096 Apr 15 14:33 bin > 19:06:21 -rwxr-xr-x 1 root root 1294 Jul 3 17:02 bootstrap > 19:06:21 -rw-r--r-- 1 root root 536015 May 29 09:21 CHANGELOG > 19:06:21 drwxr-xr-x 2 root root 4096 May 29 11:30 cmake > 19:06:21 -rw-r--r-- 1 root root 3990 May 7 13:40 CMakeLists.txt > 19:06:21 -rw-r--r-- 1 root root 105737 May 7 13:40 configure.ac > 19:06:21 lrwxrwxrwx 1 root root 31 Apr 15 14:33 CONTRIBUTING.md -> > ./docs/beginner-contribution.md > 19:06:21 drwxr-xr-x 6 root root 4096 May 28 19:18 docs > 19:06:21 -rw-r--r-- 1 root root 63778 Apr 15 14:33 Doxyfile > 19:06:21 drwxr-xr-x 8 root root 4096 Jul 3 17:02 .git > 19:06:21 -rw-r--r-- 1 root root 99 Apr 15 14:33 .gitattributes > 19:06:21 drwxr-xr-x 3 root root 4096 Aug 27 2019 include > 19:06:21 -rw-r--r-- 1 root root 66156 Apr 15 14:33 LICENSE > 19:06:21 drwxr-xr-x 2 root root 4096 Apr 15 14:33 m4 > 19:06:21 -rw-r--r-- 1 root root 3842 Apr 15 14:33 Makefile.am > 19:06:21 -rw-r--r-- 1 root root426 Apr 15 14:33 mesos.pc.in > 19:06:21 -rw-r--r-- 1 root root162 Apr 15 14:33 NOTICE > 19:06:21 -rw-r--r-- 1 root root 1103 Apr 15 14:33 README.md > 19:06:21 drwxr-xr-x 5 root root 4096 Jul 3 17:04 site > 19:06:21 drwxr-xr-x 48 root root 4096 Jun 30 19:30 src > 19:06:21 drwxr-xr-x 9 root root 4096 Jul 3 17:02 support > 19:06:21 autoreconf: Entering directory `.' > 19:06:21 autoreconf: configure.ac: not using Gettext > 19:06:22 autoreconf: running: aclocal --warnings=all -I m4 > 19:06:23 autom4te: cannot create autom4te.cache: No such file or directory > {noformat} > Note that the Dockerfile specifies "USER root" > https://github.com/apache/mesos/blob/master/support/mesos-website/Dockerfile > and the permissions are dropped to the "testuser" only inside the > entrypoint.sh script. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10167) Mesos-websitebot fails due to wrong permissions of voulmes mounted into Docker container
[ https://issues.apache.org/jira/browse/MESOS-10167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166527#comment-17166527 ] Andrei Sekretenko commented on MESOS-10167: --- Note that wrong permissions not allowing autom4te to create the cache directory *do* cause the misleading "no such file or directory" error. See http://git.savannah.gnu.org/cgit/autoconf.git/tree/bin/autom4te.in#n998 > Mesos-websitebot fails due to wrong permissions of voulmes mounted into > Docker container > > > Key: MESOS-10167 > URL: https://issues.apache.org/jira/browse/MESOS-10167 > Project: Mesos > Issue Type: Bug > Components: project website >Reporter: Andrei Sekretenko >Priority: Minor > > Last successful run was on Apr 7: > https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2464/ > First failure: > https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2465/console > Build with added permissions dump > https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2525/console > shows that while the build scripts in the container are, as expected, running > under "tempuser" (with the same uid as the user outside container which pulls > the git repositories), > the directories with git repositories mounted into the container are owned by > root: > {noformat} > 19:06:21 uid=910(tempuser) gid=1001(tempuser) groups=1001(tempuser) > 19:06:21 total 836 > 19:06:21 drwxr-xr-x 12 root root 4096 Jul 3 17:02 . > 19:06:21 drwxr-xr-x 1 root root 4096 Jul 3 17:04 .. > 19:06:21 drwxr-xr-x 6 root root 4096 Jun 29 14:12 3rdparty > 19:06:21 drwxr-xr-x 2 root root 4096 Apr 15 14:33 bin > 19:06:21 -rwxr-xr-x 1 root root 1294 Jul 3 17:02 bootstrap > 19:06:21 -rw-r--r-- 1 root root 536015 May 29 09:21 CHANGELOG > 19:06:21 drwxr-xr-x 2 root root 4096 May 29 11:30 cmake > 19:06:21 -rw-r--r-- 1 root root 3990 May 7 13:40 CMakeLists.txt > 19:06:21 -rw-r--r-- 1 root root 105737 May 7 13:40 configure.ac > 19:06:21 lrwxrwxrwx 1 root root 31 Apr 15 14:33 CONTRIBUTING.md -> > ./docs/beginner-contribution.md > 19:06:21 drwxr-xr-x 6 root root 4096 May 28 19:18 docs > 19:06:21 -rw-r--r-- 1 root root 63778 Apr 15 14:33 Doxyfile > 19:06:21 drwxr-xr-x 8 root root 4096 Jul 3 17:02 .git > 19:06:21 -rw-r--r-- 1 root root 99 Apr 15 14:33 .gitattributes > 19:06:21 drwxr-xr-x 3 root root 4096 Aug 27 2019 include > 19:06:21 -rw-r--r-- 1 root root 66156 Apr 15 14:33 LICENSE > 19:06:21 drwxr-xr-x 2 root root 4096 Apr 15 14:33 m4 > 19:06:21 -rw-r--r-- 1 root root 3842 Apr 15 14:33 Makefile.am > 19:06:21 -rw-r--r-- 1 root root426 Apr 15 14:33 mesos.pc.in > 19:06:21 -rw-r--r-- 1 root root162 Apr 15 14:33 NOTICE > 19:06:21 -rw-r--r-- 1 root root 1103 Apr 15 14:33 README.md > 19:06:21 drwxr-xr-x 5 root root 4096 Jul 3 17:04 site > 19:06:21 drwxr-xr-x 48 root root 4096 Jun 30 19:30 src > 19:06:21 drwxr-xr-x 9 root root 4096 Jul 3 17:02 support > 19:06:21 autoreconf: Entering directory `.' > 19:06:21 autoreconf: configure.ac: not using Gettext > 19:06:22 autoreconf: running: aclocal --warnings=all -I m4 > 19:06:23 autom4te: cannot create autom4te.cache: No such file or directory > {noformat} > Note that the Dockerfile specifies "USER root" > https://github.com/apache/mesos/blob/master/support/mesos-website/Dockerfile > and the permissions are dropped to the "testuser" only inside the > entrypoint.sh script. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10167) Mesos-websitebot fails due to wrong permissions of voulmes mounted into Docker container
[ https://issues.apache.org/jira/browse/MESOS-10167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166516#comment-17166516 ] Andrei Sekretenko commented on MESOS-10167: --- I haven't managed to reproduce this locally on my system - neithier with docker 19.03.6, build 369ce74a3c, nor with 19.03.8, build afacb8b7f. > Mesos-websitebot fails due to wrong permissions of voulmes mounted into > Docker container > > > Key: MESOS-10167 > URL: https://issues.apache.org/jira/browse/MESOS-10167 > Project: Mesos > Issue Type: Bug > Components: project website >Reporter: Andrei Sekretenko >Priority: Minor > > Last successful run was on Apr 7: > https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2464/ > First failure: > https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2465/console > Build with added permissions dump > https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2525/console > shows that while the build scripts in the container are, as expected, running > under "tempuser" (with the same uid as the user outside container which pulls > the git repositories), > the directories with git repositories mounted into the container are owned by > root: > {noformat} > 19:06:21 uid=910(tempuser) gid=1001(tempuser) groups=1001(tempuser) > 19:06:21 total 836 > 19:06:21 drwxr-xr-x 12 root root 4096 Jul 3 17:02 . > 19:06:21 drwxr-xr-x 1 root root 4096 Jul 3 17:04 .. > 19:06:21 drwxr-xr-x 6 root root 4096 Jun 29 14:12 3rdparty > 19:06:21 drwxr-xr-x 2 root root 4096 Apr 15 14:33 bin > 19:06:21 -rwxr-xr-x 1 root root 1294 Jul 3 17:02 bootstrap > 19:06:21 -rw-r--r-- 1 root root 536015 May 29 09:21 CHANGELOG > 19:06:21 drwxr-xr-x 2 root root 4096 May 29 11:30 cmake > 19:06:21 -rw-r--r-- 1 root root 3990 May 7 13:40 CMakeLists.txt > 19:06:21 -rw-r--r-- 1 root root 105737 May 7 13:40 configure.ac > 19:06:21 lrwxrwxrwx 1 root root 31 Apr 15 14:33 CONTRIBUTING.md -> > ./docs/beginner-contribution.md > 19:06:21 drwxr-xr-x 6 root root 4096 May 28 19:18 docs > 19:06:21 -rw-r--r-- 1 root root 63778 Apr 15 14:33 Doxyfile > 19:06:21 drwxr-xr-x 8 root root 4096 Jul 3 17:02 .git > 19:06:21 -rw-r--r-- 1 root root 99 Apr 15 14:33 .gitattributes > 19:06:21 drwxr-xr-x 3 root root 4096 Aug 27 2019 include > 19:06:21 -rw-r--r-- 1 root root 66156 Apr 15 14:33 LICENSE > 19:06:21 drwxr-xr-x 2 root root 4096 Apr 15 14:33 m4 > 19:06:21 -rw-r--r-- 1 root root 3842 Apr 15 14:33 Makefile.am > 19:06:21 -rw-r--r-- 1 root root426 Apr 15 14:33 mesos.pc.in > 19:06:21 -rw-r--r-- 1 root root162 Apr 15 14:33 NOTICE > 19:06:21 -rw-r--r-- 1 root root 1103 Apr 15 14:33 README.md > 19:06:21 drwxr-xr-x 5 root root 4096 Jul 3 17:04 site > 19:06:21 drwxr-xr-x 48 root root 4096 Jun 30 19:30 src > 19:06:21 drwxr-xr-x 9 root root 4096 Jul 3 17:02 support > 19:06:21 autoreconf: Entering directory `.' > 19:06:21 autoreconf: configure.ac: not using Gettext > 19:06:22 autoreconf: running: aclocal --warnings=all -I m4 > 19:06:23 autom4te: cannot create autom4te.cache: No such file or directory > {noformat} > Note that the Dockerfile specifies "USER root" > https://github.com/apache/mesos/blob/master/support/mesos-website/Dockerfile > and the permissions are dropped to the "testuser" only inside the > entrypoint.sh script. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (MESOS-10167) Mesos-websitebot fails due to wrong permissions of voulmes mounted into Docker container
Andrei Sekretenko created MESOS-10167: - Summary: Mesos-websitebot fails due to wrong permissions of voulmes mounted into Docker container Key: MESOS-10167 URL: https://issues.apache.org/jira/browse/MESOS-10167 Project: Mesos Issue Type: Bug Components: project website Reporter: Andrei Sekretenko Last successful run was on Apr 7: https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2464/ First failure: https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2465/console Build with added permissions dump https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2525/console shows that while the build scripts in the container are, as expected, running under "tempuser" (with the same uid as the user outside container which pulls the git repositories), the directories with git repositories mounted into the container are owned by root: {noformat} 19:06:21 uid=910(tempuser) gid=1001(tempuser) groups=1001(tempuser) 19:06:21 total 836 19:06:21 drwxr-xr-x 12 root root 4096 Jul 3 17:02 . 19:06:21 drwxr-xr-x 1 root root 4096 Jul 3 17:04 .. 19:06:21 drwxr-xr-x 6 root root 4096 Jun 29 14:12 3rdparty 19:06:21 drwxr-xr-x 2 root root 4096 Apr 15 14:33 bin 19:06:21 -rwxr-xr-x 1 root root 1294 Jul 3 17:02 bootstrap 19:06:21 -rw-r--r-- 1 root root 536015 May 29 09:21 CHANGELOG 19:06:21 drwxr-xr-x 2 root root 4096 May 29 11:30 cmake 19:06:21 -rw-r--r-- 1 root root 3990 May 7 13:40 CMakeLists.txt 19:06:21 -rw-r--r-- 1 root root 105737 May 7 13:40 configure.ac 19:06:21 lrwxrwxrwx 1 root root 31 Apr 15 14:33 CONTRIBUTING.md -> ./docs/beginner-contribution.md 19:06:21 drwxr-xr-x 6 root root 4096 May 28 19:18 docs 19:06:21 -rw-r--r-- 1 root root 63778 Apr 15 14:33 Doxyfile 19:06:21 drwxr-xr-x 8 root root 4096 Jul 3 17:02 .git 19:06:21 -rw-r--r-- 1 root root 99 Apr 15 14:33 .gitattributes 19:06:21 drwxr-xr-x 3 root root 4096 Aug 27 2019 include 19:06:21 -rw-r--r-- 1 root root 66156 Apr 15 14:33 LICENSE 19:06:21 drwxr-xr-x 2 root root 4096 Apr 15 14:33 m4 19:06:21 -rw-r--r-- 1 root root 3842 Apr 15 14:33 Makefile.am 19:06:21 -rw-r--r-- 1 root root426 Apr 15 14:33 mesos.pc.in 19:06:21 -rw-r--r-- 1 root root162 Apr 15 14:33 NOTICE 19:06:21 -rw-r--r-- 1 root root 1103 Apr 15 14:33 README.md 19:06:21 drwxr-xr-x 5 root root 4096 Jul 3 17:04 site 19:06:21 drwxr-xr-x 48 root root 4096 Jun 30 19:30 src 19:06:21 drwxr-xr-x 9 root root 4096 Jul 3 17:02 support 19:06:21 autoreconf: Entering directory `.' 19:06:21 autoreconf: configure.ac: not using Gettext 19:06:22 autoreconf: running: aclocal --warnings=all -I m4 19:06:23 autom4te: cannot create autom4te.cache: No such file or directory {noformat} Note that the Dockerfile specifies "USER root" https://github.com/apache/mesos/blob/master/support/mesos-website/Dockerfile and the permissions are dropped to the "testuser" only inside the entrypoint.sh script. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10143) Outstanding Offers accumulating
[ https://issues.apache.org/jira/browse/MESOS-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166505#comment-17166505 ] Benjamin Mahler commented on MESOS-10143: - [~puneetku287] It looks like the scheduler native library is getting backlogged, this can happen when the scheduler cannot process messages as fast as they come in from the master. (In your example I see Scheduler::resourceOffers took 6.8 ms which is long). If you want to check this while it's happening the next time, you can hit {{http://IP:PORT/metrics/snapshot}} of the scheduler library, and it shouldn't return because the scheduler metrics are not able to get computed in a timely manner. You can also specify a timeout via {{http://IP:PORT/metrics/snapshot?timeout=10secs}} and should see a response without the {{scheduler/event_queue_messages}} metric present. You may want to fix the port of the scheduler library in order to do this, by setting LIBPROCESS_PORT=X in your environment before instantiating the library. > Outstanding Offers accumulating > --- > > Key: MESOS-10143 > URL: https://issues.apache.org/jira/browse/MESOS-10143 > Project: Mesos > Issue Type: Bug > Components: master, scheduler driver >Affects Versions: 1.7.0 > Environment: Mesos Version 1.7.0 > JDK 8.0 >Reporter: Puneet Kumar >Priority: Minor > > We manage an Apache Mesos cluster version 1.7.0. We have written a framework > in Java that schedules tasks to Mesos master at a rate of 300 TPS. Everything > works fine for almost 24 hours but then outstanding offers accumulate & > saturate within 15 minutes. Outstanding offers aren't reclaimed by Mesos > master. We observe "RescindOffer" messages in verbose (GLOG v=3) framework > logs but outstanding offers don't reduce. New resources aren't offered to > framework when outstanding offers saturate. We have to restart the scheduler > to reset outstanding offers to zero. > Any suggestions to debug this issue are welcome. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (MESOS-10166) Avoid sending framework updates to agents and subscribers when frameworkInfo/pid didn't change.
Andrei Sekretenko created MESOS-10166: - Summary: Avoid sending framework updates to agents and subscribers when frameworkInfo/pid didn't change. Key: MESOS-10166 URL: https://issues.apache.org/jira/browse/MESOS-10166 Project: Mesos Issue Type: Task Reporter: Andrei Sekretenko Currently, FrameworkInfo is broadcast to agents and V1API events subscribers on every framework resubscription/update, regardless of whether it has actually changed or not. When schedulers frequently call UPDATE_FRAMEWORK to only update subscription settings (the list of suppressed roles, and, after implementing https://issues.apache.org/jira/browse/MESOS-10161, offer constraints), the FrameworkInfo broadcast on every UPDATE_FRAMEWORK call becomes undesirable. Sending 'pid' update to agents is affected by the same issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MESOS-10162) Constraints-based offer filtering design doc
[ https://issues.apache.org/jira/browse/MESOS-10162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166424#comment-17166424 ] Andrei Sekretenko commented on MESOS-10162: --- Design doc: https://docs.google.com/document/d/1MV048BwjLSoa8sn_5hs4kIH4YJMf6-Gsqbij3YuT1No/edit#heading=h.wq9atl6k4yq0 > Constraints-based offer filtering design doc > > > Key: MESOS-10162 > URL: https://issues.apache.org/jira/browse/MESOS-10162 > Project: Mesos > Issue Type: Task >Reporter: Andrei Sekretenko >Assignee: Andrei Sekretenko >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)