[jira] [Created] (MESOS-10168) Add secrets support to the CSI service and volume managers

2020-07-28 Thread Greg Mann (Jira)
Greg Mann created MESOS-10168:
-

 Summary: Add secrets support to the CSI service and volume managers
 Key: MESOS-10168
 URL: https://issues.apache.org/jira/browse/MESOS-10168
 Project: Mesos
  Issue Type: Task
Reporter: Greg Mann


We must update our CSI code to pass secrets to CSI drivers when 
staging/unstaging and publishing/unpublishing volumes. We must ensure that we 
avoid writing any secrets to disk by holding a secret resolver in the 
appropriate component to resolve secrets associated with already-attached 
volumes during/after recovery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (MESOS-10167) Mesos-websitebot fails due to wrong permissions of voulmes mounted into Docker container

2020-07-28 Thread Andrei Sekretenko (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166527#comment-17166527
 ] 

Andrei Sekretenko edited comment on MESOS-10167 at 7/28/20, 4:27 PM:
-

Note that wrong permissions not allowing autom4te to create the cache directory 
 *do* cause the misleading "no such file or directory" error message.
See http://git.savannah.gnu.org/cgit/autoconf.git/tree/bin/autom4te.in#n998


was (Author: asekretenko):
Note that wrong permissions not allowing autom4te to create the cache directory 
 *do* cause the misleading "no such file or directory" error.
See http://git.savannah.gnu.org/cgit/autoconf.git/tree/bin/autom4te.in#n998

> Mesos-websitebot fails due to wrong permissions of voulmes mounted into 
> Docker container
> 
>
> Key: MESOS-10167
> URL: https://issues.apache.org/jira/browse/MESOS-10167
> Project: Mesos
>  Issue Type: Bug
>  Components: project website
>Reporter: Andrei Sekretenko
>Priority: Minor
>
> Last successful run was on Apr 7:
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2464/
> First failure:
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2465/console
> Build with added permissions dump 
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2525/console
> shows that while the build scripts in the container are, as expected, running 
> under "tempuser" (with the same uid as the user outside container which pulls 
> the git repositories),
> the directories with git repositories mounted into the container are owned by 
> root:
> {noformat}
> 19:06:21 uid=910(tempuser) gid=1001(tempuser) groups=1001(tempuser)
> 19:06:21 total 836
> 19:06:21 drwxr-xr-x 12 root root   4096 Jul  3 17:02 .
> 19:06:21 drwxr-xr-x  1 root root   4096 Jul  3 17:04 ..
> 19:06:21 drwxr-xr-x  6 root root   4096 Jun 29 14:12 3rdparty
> 19:06:21 drwxr-xr-x  2 root root   4096 Apr 15 14:33 bin
> 19:06:21 -rwxr-xr-x  1 root root   1294 Jul  3 17:02 bootstrap
> 19:06:21 -rw-r--r--  1 root root 536015 May 29 09:21 CHANGELOG
> 19:06:21 drwxr-xr-x  2 root root   4096 May 29 11:30 cmake
> 19:06:21 -rw-r--r--  1 root root   3990 May  7 13:40 CMakeLists.txt
> 19:06:21 -rw-r--r--  1 root root 105737 May  7 13:40 configure.ac
> 19:06:21 lrwxrwxrwx  1 root root 31 Apr 15 14:33 CONTRIBUTING.md -> 
> ./docs/beginner-contribution.md
> 19:06:21 drwxr-xr-x  6 root root   4096 May 28 19:18 docs
> 19:06:21 -rw-r--r--  1 root root  63778 Apr 15 14:33 Doxyfile
> 19:06:21 drwxr-xr-x  8 root root   4096 Jul  3 17:02 .git
> 19:06:21 -rw-r--r--  1 root root 99 Apr 15 14:33 .gitattributes
> 19:06:21 drwxr-xr-x  3 root root   4096 Aug 27  2019 include
> 19:06:21 -rw-r--r--  1 root root  66156 Apr 15 14:33 LICENSE
> 19:06:21 drwxr-xr-x  2 root root   4096 Apr 15 14:33 m4
> 19:06:21 -rw-r--r--  1 root root   3842 Apr 15 14:33 Makefile.am
> 19:06:21 -rw-r--r--  1 root root426 Apr 15 14:33 mesos.pc.in
> 19:06:21 -rw-r--r--  1 root root162 Apr 15 14:33 NOTICE
> 19:06:21 -rw-r--r--  1 root root   1103 Apr 15 14:33 README.md
> 19:06:21 drwxr-xr-x  5 root root   4096 Jul  3 17:04 site
> 19:06:21 drwxr-xr-x 48 root root   4096 Jun 30 19:30 src
> 19:06:21 drwxr-xr-x  9 root root   4096 Jul  3 17:02 support
> 19:06:21 autoreconf: Entering directory `.'
> 19:06:21 autoreconf: configure.ac: not using Gettext
> 19:06:22 autoreconf: running: aclocal --warnings=all -I m4
> 19:06:23 autom4te: cannot create autom4te.cache: No such file or directory
> {noformat}
> Note that the Dockerfile specifies "USER root" 
> https://github.com/apache/mesos/blob/master/support/mesos-website/Dockerfile 
> and the permissions are dropped to the "testuser" only inside the 
> entrypoint.sh script.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10167) Mesos-websitebot fails due to wrong permissions of voulmes mounted into Docker container

2020-07-28 Thread Andrei Sekretenko (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166527#comment-17166527
 ] 

Andrei Sekretenko commented on MESOS-10167:
---

Note that wrong permissions not allowing autom4te to create the cache directory 
 *do* cause the misleading "no such file or directory" error.
See http://git.savannah.gnu.org/cgit/autoconf.git/tree/bin/autom4te.in#n998

> Mesos-websitebot fails due to wrong permissions of voulmes mounted into 
> Docker container
> 
>
> Key: MESOS-10167
> URL: https://issues.apache.org/jira/browse/MESOS-10167
> Project: Mesos
>  Issue Type: Bug
>  Components: project website
>Reporter: Andrei Sekretenko
>Priority: Minor
>
> Last successful run was on Apr 7:
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2464/
> First failure:
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2465/console
> Build with added permissions dump 
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2525/console
> shows that while the build scripts in the container are, as expected, running 
> under "tempuser" (with the same uid as the user outside container which pulls 
> the git repositories),
> the directories with git repositories mounted into the container are owned by 
> root:
> {noformat}
> 19:06:21 uid=910(tempuser) gid=1001(tempuser) groups=1001(tempuser)
> 19:06:21 total 836
> 19:06:21 drwxr-xr-x 12 root root   4096 Jul  3 17:02 .
> 19:06:21 drwxr-xr-x  1 root root   4096 Jul  3 17:04 ..
> 19:06:21 drwxr-xr-x  6 root root   4096 Jun 29 14:12 3rdparty
> 19:06:21 drwxr-xr-x  2 root root   4096 Apr 15 14:33 bin
> 19:06:21 -rwxr-xr-x  1 root root   1294 Jul  3 17:02 bootstrap
> 19:06:21 -rw-r--r--  1 root root 536015 May 29 09:21 CHANGELOG
> 19:06:21 drwxr-xr-x  2 root root   4096 May 29 11:30 cmake
> 19:06:21 -rw-r--r--  1 root root   3990 May  7 13:40 CMakeLists.txt
> 19:06:21 -rw-r--r--  1 root root 105737 May  7 13:40 configure.ac
> 19:06:21 lrwxrwxrwx  1 root root 31 Apr 15 14:33 CONTRIBUTING.md -> 
> ./docs/beginner-contribution.md
> 19:06:21 drwxr-xr-x  6 root root   4096 May 28 19:18 docs
> 19:06:21 -rw-r--r--  1 root root  63778 Apr 15 14:33 Doxyfile
> 19:06:21 drwxr-xr-x  8 root root   4096 Jul  3 17:02 .git
> 19:06:21 -rw-r--r--  1 root root 99 Apr 15 14:33 .gitattributes
> 19:06:21 drwxr-xr-x  3 root root   4096 Aug 27  2019 include
> 19:06:21 -rw-r--r--  1 root root  66156 Apr 15 14:33 LICENSE
> 19:06:21 drwxr-xr-x  2 root root   4096 Apr 15 14:33 m4
> 19:06:21 -rw-r--r--  1 root root   3842 Apr 15 14:33 Makefile.am
> 19:06:21 -rw-r--r--  1 root root426 Apr 15 14:33 mesos.pc.in
> 19:06:21 -rw-r--r--  1 root root162 Apr 15 14:33 NOTICE
> 19:06:21 -rw-r--r--  1 root root   1103 Apr 15 14:33 README.md
> 19:06:21 drwxr-xr-x  5 root root   4096 Jul  3 17:04 site
> 19:06:21 drwxr-xr-x 48 root root   4096 Jun 30 19:30 src
> 19:06:21 drwxr-xr-x  9 root root   4096 Jul  3 17:02 support
> 19:06:21 autoreconf: Entering directory `.'
> 19:06:21 autoreconf: configure.ac: not using Gettext
> 19:06:22 autoreconf: running: aclocal --warnings=all -I m4
> 19:06:23 autom4te: cannot create autom4te.cache: No such file or directory
> {noformat}
> Note that the Dockerfile specifies "USER root" 
> https://github.com/apache/mesos/blob/master/support/mesos-website/Dockerfile 
> and the permissions are dropped to the "testuser" only inside the 
> entrypoint.sh script.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10167) Mesos-websitebot fails due to wrong permissions of voulmes mounted into Docker container

2020-07-28 Thread Andrei Sekretenko (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166516#comment-17166516
 ] 

Andrei Sekretenko commented on MESOS-10167:
---

I haven't managed to reproduce this locally on my system - neithier with docker 
19.03.6, build 369ce74a3c, nor with 19.03.8, build afacb8b7f.

> Mesos-websitebot fails due to wrong permissions of voulmes mounted into 
> Docker container
> 
>
> Key: MESOS-10167
> URL: https://issues.apache.org/jira/browse/MESOS-10167
> Project: Mesos
>  Issue Type: Bug
>  Components: project website
>Reporter: Andrei Sekretenko
>Priority: Minor
>
> Last successful run was on Apr 7:
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2464/
> First failure:
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2465/console
> Build with added permissions dump 
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2525/console
> shows that while the build scripts in the container are, as expected, running 
> under "tempuser" (with the same uid as the user outside container which pulls 
> the git repositories),
> the directories with git repositories mounted into the container are owned by 
> root:
> {noformat}
> 19:06:21 uid=910(tempuser) gid=1001(tempuser) groups=1001(tempuser)
> 19:06:21 total 836
> 19:06:21 drwxr-xr-x 12 root root   4096 Jul  3 17:02 .
> 19:06:21 drwxr-xr-x  1 root root   4096 Jul  3 17:04 ..
> 19:06:21 drwxr-xr-x  6 root root   4096 Jun 29 14:12 3rdparty
> 19:06:21 drwxr-xr-x  2 root root   4096 Apr 15 14:33 bin
> 19:06:21 -rwxr-xr-x  1 root root   1294 Jul  3 17:02 bootstrap
> 19:06:21 -rw-r--r--  1 root root 536015 May 29 09:21 CHANGELOG
> 19:06:21 drwxr-xr-x  2 root root   4096 May 29 11:30 cmake
> 19:06:21 -rw-r--r--  1 root root   3990 May  7 13:40 CMakeLists.txt
> 19:06:21 -rw-r--r--  1 root root 105737 May  7 13:40 configure.ac
> 19:06:21 lrwxrwxrwx  1 root root 31 Apr 15 14:33 CONTRIBUTING.md -> 
> ./docs/beginner-contribution.md
> 19:06:21 drwxr-xr-x  6 root root   4096 May 28 19:18 docs
> 19:06:21 -rw-r--r--  1 root root  63778 Apr 15 14:33 Doxyfile
> 19:06:21 drwxr-xr-x  8 root root   4096 Jul  3 17:02 .git
> 19:06:21 -rw-r--r--  1 root root 99 Apr 15 14:33 .gitattributes
> 19:06:21 drwxr-xr-x  3 root root   4096 Aug 27  2019 include
> 19:06:21 -rw-r--r--  1 root root  66156 Apr 15 14:33 LICENSE
> 19:06:21 drwxr-xr-x  2 root root   4096 Apr 15 14:33 m4
> 19:06:21 -rw-r--r--  1 root root   3842 Apr 15 14:33 Makefile.am
> 19:06:21 -rw-r--r--  1 root root426 Apr 15 14:33 mesos.pc.in
> 19:06:21 -rw-r--r--  1 root root162 Apr 15 14:33 NOTICE
> 19:06:21 -rw-r--r--  1 root root   1103 Apr 15 14:33 README.md
> 19:06:21 drwxr-xr-x  5 root root   4096 Jul  3 17:04 site
> 19:06:21 drwxr-xr-x 48 root root   4096 Jun 30 19:30 src
> 19:06:21 drwxr-xr-x  9 root root   4096 Jul  3 17:02 support
> 19:06:21 autoreconf: Entering directory `.'
> 19:06:21 autoreconf: configure.ac: not using Gettext
> 19:06:22 autoreconf: running: aclocal --warnings=all -I m4
> 19:06:23 autom4te: cannot create autom4te.cache: No such file or directory
> {noformat}
> Note that the Dockerfile specifies "USER root" 
> https://github.com/apache/mesos/blob/master/support/mesos-website/Dockerfile 
> and the permissions are dropped to the "testuser" only inside the 
> entrypoint.sh script.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-10167) Mesos-websitebot fails due to wrong permissions of voulmes mounted into Docker container

2020-07-28 Thread Andrei Sekretenko (Jira)
Andrei Sekretenko created MESOS-10167:
-

 Summary: Mesos-websitebot fails due to wrong permissions of 
voulmes mounted into Docker container
 Key: MESOS-10167
 URL: https://issues.apache.org/jira/browse/MESOS-10167
 Project: Mesos
  Issue Type: Bug
  Components: project website
Reporter: Andrei Sekretenko


Last successful run was on Apr 7:
https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2464/

First failure:
https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2465/console

Build with added permissions dump 
https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Websitebot/2525/console
shows that while the build scripts in the container are, as expected, running 
under "tempuser" (with the same uid as the user outside container which pulls 
the git repositories),
the directories with git repositories mounted into the container are owned by 
root:

{noformat}
19:06:21 uid=910(tempuser) gid=1001(tempuser) groups=1001(tempuser)
19:06:21 total 836
19:06:21 drwxr-xr-x 12 root root   4096 Jul  3 17:02 .
19:06:21 drwxr-xr-x  1 root root   4096 Jul  3 17:04 ..
19:06:21 drwxr-xr-x  6 root root   4096 Jun 29 14:12 3rdparty
19:06:21 drwxr-xr-x  2 root root   4096 Apr 15 14:33 bin
19:06:21 -rwxr-xr-x  1 root root   1294 Jul  3 17:02 bootstrap
19:06:21 -rw-r--r--  1 root root 536015 May 29 09:21 CHANGELOG
19:06:21 drwxr-xr-x  2 root root   4096 May 29 11:30 cmake
19:06:21 -rw-r--r--  1 root root   3990 May  7 13:40 CMakeLists.txt
19:06:21 -rw-r--r--  1 root root 105737 May  7 13:40 configure.ac
19:06:21 lrwxrwxrwx  1 root root 31 Apr 15 14:33 CONTRIBUTING.md -> 
./docs/beginner-contribution.md
19:06:21 drwxr-xr-x  6 root root   4096 May 28 19:18 docs
19:06:21 -rw-r--r--  1 root root  63778 Apr 15 14:33 Doxyfile
19:06:21 drwxr-xr-x  8 root root   4096 Jul  3 17:02 .git
19:06:21 -rw-r--r--  1 root root 99 Apr 15 14:33 .gitattributes
19:06:21 drwxr-xr-x  3 root root   4096 Aug 27  2019 include
19:06:21 -rw-r--r--  1 root root  66156 Apr 15 14:33 LICENSE
19:06:21 drwxr-xr-x  2 root root   4096 Apr 15 14:33 m4
19:06:21 -rw-r--r--  1 root root   3842 Apr 15 14:33 Makefile.am
19:06:21 -rw-r--r--  1 root root426 Apr 15 14:33 mesos.pc.in
19:06:21 -rw-r--r--  1 root root162 Apr 15 14:33 NOTICE
19:06:21 -rw-r--r--  1 root root   1103 Apr 15 14:33 README.md
19:06:21 drwxr-xr-x  5 root root   4096 Jul  3 17:04 site
19:06:21 drwxr-xr-x 48 root root   4096 Jun 30 19:30 src
19:06:21 drwxr-xr-x  9 root root   4096 Jul  3 17:02 support
19:06:21 autoreconf: Entering directory `.'
19:06:21 autoreconf: configure.ac: not using Gettext
19:06:22 autoreconf: running: aclocal --warnings=all -I m4
19:06:23 autom4te: cannot create autom4te.cache: No such file or directory
{noformat}

Note that the Dockerfile specifies "USER root" 
https://github.com/apache/mesos/blob/master/support/mesos-website/Dockerfile 
and the permissions are dropped to the "testuser" only inside the entrypoint.sh 
script.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10143) Outstanding Offers accumulating

2020-07-28 Thread Benjamin Mahler (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166505#comment-17166505
 ] 

Benjamin Mahler commented on MESOS-10143:
-

[~puneetku287] It looks like the scheduler native library is getting 
backlogged, this can happen when the scheduler cannot process messages as fast 
as they come in from the master. (In your example I see 
Scheduler::resourceOffers took 6.8 ms which is long).

If you want to check this while it's happening the next time, you can hit 
{{http://IP:PORT/metrics/snapshot}} of the scheduler library, and it shouldn't 
return because the scheduler metrics are not able to get computed in a timely 
manner. You can also specify a timeout via 
{{http://IP:PORT/metrics/snapshot?timeout=10secs}} and should see a response 
without the {{scheduler/event_queue_messages}} metric present.

You may want to fix the port of the scheduler library in order to do this, by 
setting LIBPROCESS_PORT=X in your environment before instantiating the library.

> Outstanding Offers accumulating
> ---
>
> Key: MESOS-10143
> URL: https://issues.apache.org/jira/browse/MESOS-10143
> Project: Mesos
>  Issue Type: Bug
>  Components: master, scheduler driver
>Affects Versions: 1.7.0
> Environment: Mesos Version 1.7.0
> JDK 8.0
>Reporter: Puneet Kumar
>Priority: Minor
>
> We manage an Apache Mesos cluster version 1.7.0. We have written a framework 
> in Java that schedules tasks to Mesos master at a rate of 300 TPS. Everything 
> works fine for almost 24 hours but then outstanding offers accumulate & 
> saturate within 15 minutes. Outstanding offers aren't reclaimed by Mesos 
> master. We observe "RescindOffer" messages in verbose (GLOG v=3) framework 
> logs but outstanding offers don't reduce. New resources aren't offered to 
> framework when outstanding offers saturate. We have to restart the scheduler 
> to reset outstanding offers to zero.
> Any suggestions to debug this issue are welcome.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (MESOS-10166) Avoid sending framework updates to agents and subscribers when frameworkInfo/pid didn't change.

2020-07-28 Thread Andrei Sekretenko (Jira)
Andrei Sekretenko created MESOS-10166:
-

 Summary: Avoid sending framework updates to agents and subscribers 
when frameworkInfo/pid didn't change.
 Key: MESOS-10166
 URL: https://issues.apache.org/jira/browse/MESOS-10166
 Project: Mesos
  Issue Type: Task
Reporter: Andrei Sekretenko


Currently, FrameworkInfo is broadcast to agents and V1API events subscribers on 
every framework resubscription/update, regardless of whether it has actually 
changed or not.

When schedulers frequently call UPDATE_FRAMEWORK to only update subscription 
settings (the list of suppressed roles, and, after implementing 
https://issues.apache.org/jira/browse/MESOS-10161, offer constraints), the 
FrameworkInfo broadcast on every UPDATE_FRAMEWORK call becomes undesirable.

Sending 'pid' update to agents is affected by the same issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (MESOS-10162) Constraints-based offer filtering design doc

2020-07-28 Thread Andrei Sekretenko (Jira)


[ 
https://issues.apache.org/jira/browse/MESOS-10162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166424#comment-17166424
 ] 

Andrei Sekretenko commented on MESOS-10162:
---

Design doc: 
https://docs.google.com/document/d/1MV048BwjLSoa8sn_5hs4kIH4YJMf6-Gsqbij3YuT1No/edit#heading=h.wq9atl6k4yq0

> Constraints-based offer filtering design doc
> 
>
> Key: MESOS-10162
> URL: https://issues.apache.org/jira/browse/MESOS-10162
> Project: Mesos
>  Issue Type: Task
>Reporter: Andrei Sekretenko
>Assignee: Andrei Sekretenko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)