[jira] [Commented] (MESOS-6183) mesos's Unified Containerizer cannot set "--insecure-registry" when provisioning images
[ https://issues.apache.org/jira/browse/MESOS-6183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15498298#comment-15498298 ] yongyu commented on MESOS-6183: --- I am launching docker. I use docker daemon with "--insecure-registry :5000" to launch docker, it is ok. But i use mesos Containerizer to lauch docker, it can not support private registry. > mesos's Unified Containerizer cannot set "--insecure-registry" when > provisioning images > --- > > Key: MESOS-6183 > URL: https://issues.apache.org/jira/browse/MESOS-6183 > Project: Mesos > Issue Type: Bug >Reporter: yongyu >Priority: Minor > > mesos's Unified Containerizer cannot set "--insecure-registry" when > provisioning images -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6127) Implement suppport for HTTP/2
[ https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15498217#comment-15498217 ] Aaron Wood commented on MESOS-6127: --- Thanks, that would be great! Looks like http-parser still doesn't support HTTP/2. > Implement suppport for HTTP/2 > - > > Key: MESOS-6127 > URL: https://issues.apache.org/jira/browse/MESOS-6127 > Project: Mesos > Issue Type: Epic > Components: HTTP API, libprocess >Reporter: Aaron Wood > Labels: performance > > HTTP/2 will allow us to take advantage of connection multiplexing, header > compression, streams, server push, etc. Add support for communication over > HTTP/2 between masters and agents, framework endpoints, etc. > Should we support HTTP/2 without TLS? The spec allows for this but most major > browser vendors, libraries, and implementations aren't supporting it unless > TLS is used. If we do require TLS, what can be done to reduce the performance > hit of the TLS handshake? Might need to change more code to make sure that we > are taking advantage of connection sharing so that we can (ideally) only ever > have a one-time TLS handshake per shared connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6180) Several tests are flaky, with futures timing out early
[ https://issues.apache.org/jira/browse/MESOS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497951#comment-15497951 ] Greg Mann commented on MESOS-6180: -- Thanks for the patch to address the mount leak [~jieyu]! (https://reviews.apache.org/r/51963/) I ran {{sudo MESOS_VERBOSE=1 GLOG_v=2 GTEST_REPEAT=-1 GTEST_BREAK_ON_FAILURE=1 GTEST_FILTER="*MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespace*" bin/mesos-tests.sh}} and stressed my machine with {{stress -c N -i N -m N -d 1}}, where {{N}} is number of cores, and I was able to reproduce a couple of these offer future timeout failures after a few tens of repetitions. I attached logs above as {{flaky-containerizer-pid-namespace-forward.txt}} and {{flaky-containerizer-pid-namespace-backward.txt}}. We can see the master beginning agent registration, but we never see the line {{Registered agent ...}} from {{Master::_registerSlave()}}, which indicates that registration is complete and the registered message has been sent to the agent: {code} I0917 01:35:17.184216 480 master.cpp:4886] Registering agent at slave(11)@172.31.1.104:57341 (ip-172-31-1-104.us-west-2.compute.internal) with id fa7a42d0-5d0c-4799-b19f-2a85b43039f3-S0 I0917 01:35:17.184232 474 process.cpp:2707] Resuming __reaper__(1)@172.31.1.104:57341 at 2016-09-17 01:35:17.184222976+00:00 I0917 01:35:17.184377 474 process.cpp:2707] Resuming registrar(6)@172.31.1.104:57341 at 2016-09-17 01:35:17.184371968+00:00 I0917 01:35:17.184554 474 registrar.cpp:464] Applied 1 operations in 79217ns; attempting to update the registry I0917 01:35:17.184953 474 process.cpp:2697] Spawned process __latch__(141)@172.31.1.104:57341 I0917 01:35:17.184990 485 process.cpp:2707] Resuming log-storage(6)@172.31.1.104:57341 at 2016-09-17 01:35:17.184982016+00:00 I0917 01:35:17.185561 485 process.cpp:2707] Resuming log-writer(6)@172.31.1.104:57341 at 2016-09-17 01:35:17.185552896+00:00 I0917 01:35:17.185609 485 log.cpp:577] Attempting to append 434 bytes to the log I0917 01:35:17.185804 485 process.cpp:2707] Resuming log-coordinator(6)@172.31.1.104:57341 at 2016-09-17 01:35:17.185797888+00:00 I0917 01:35:17.185863 485 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3 I0917 01:35:17.185998 485 process.cpp:2697] Spawned process log-write(29)@172.31.1.104:57341 I0917 01:35:17.186030 475 process.cpp:2707] Resuming log-write(29)@172.31.1.104:57341 at 2016-09-17 01:35:17.186021888+00:00 I0917 01:35:17.186189 475 process.cpp:2707] Resuming log-network(6)@172.31.1.104:57341 at 2016-09-17 01:35:17.186182912+00:00 I0917 01:35:17.186275 475 process.cpp:2707] Resuming log-write(29)@172.31.1.104:57341 at 2016-09-17 01:35:17.186267904+00:00 I0917 01:35:17.186424 475 process.cpp:2707] Resuming log-network(6)@172.31.1.104:57341 at 2016-09-17 01:35:17.186416896+00:00 I0917 01:35:17.186575 475 process.cpp:2697] Spawned process __req_res__(55)@172.31.1.104:57341 I0917 01:35:17.186724 475 process.cpp:2707] Resuming log-write(29)@172.31.1.104:57341 at 2016-09-17 01:35:17.186717952+00:00 I0917 01:35:17.186609 485 process.cpp:2707] Resuming __req_res__(55)@172.31.1.104:57341 at 2016-09-17 01:35:17.186601984+00:00 I0917 01:35:17.186898 485 process.cpp:2707] Resuming log-replica(6)@172.31.1.104:57341 at 2016-09-17 01:35:17.186892032+00:00 I0917 01:35:17.186962 485 replica.cpp:537] Replica received write request for position 3 from __req_res__(55)@172.31.1.104:57341 I0917 01:35:17.185014 471 process.cpp:2707] Resuming __gc__@172.31.1.104:57341 at 2016-09-17 01:35:17.185008896+00:00 I0917 01:35:17.185036 480 process.cpp:2707] Resuming __latch__(141)@172.31.1.104:57341 at 2016-09-17 01:35:17.185029120+00:00 I0917 01:35:17.196358 482 process.cpp:2707] Resuming slave(11)@172.31.1.104:57341 at 2016-09-17 01:35:17.196335104+00:00 I0917 01:35:17.196900 482 slave.cpp:1471] Will retry registration in 25.224033ms if necessary I0917 01:35:17.197029 482 process.cpp:2707] Resuming master@172.31.1.104:57341 at 2016-09-17 01:35:17.197024000+00:00 I0917 01:35:17.197157 482 master.cpp:4874] Ignoring register agent message from slave(11)@172.31.1.104:57341 (ip-172-31-1-104.us-west-2.compute.internal) as admission is already in progress I0917 01:35:17.224309 482 process.cpp:2707] Resuming slave(11)@172.31.1.104:57341 at 2016-09-17 01:35:17.224284928+00:00 I0917 01:35:17.224845 482 slave.cpp:1471] Will retry registration in 63.510932ms if necessary I0917 01:35:17.224900 475 process.cpp:2707] Resuming master@172.31.1.104:57341 at 2016-09-17 01:35:17.224888064+00:00 I0917 01:35:17.225109 475 master.cpp:4874] Ignoring register agent message from slave(11)@172.31.1.104:57341 (ip-172-31-1-104.us-west-2.compute.internal) as admission is already in progress {code} > Several tests are flaky, with futures timing out early > ---
[jira] [Updated] (MESOS-6180) Several tests are flaky, with futures timing out early
[ https://issues.apache.org/jira/browse/MESOS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-6180: - Attachment: flaky-containerizer-pid-namespace-forward.txt flaky-containerizer-pid-namespace-backward.txt > Several tests are flaky, with futures timing out early > -- > > Key: MESOS-6180 > URL: https://issues.apache.org/jira/browse/MESOS-6180 > Project: Mesos > Issue Type: Bug > Components: tests >Reporter: Greg Mann >Assignee: haosdent > Labels: mesosphere, tests > Attachments: CGROUPS_ROOT_PidNamespaceBackward.log, > CGROUPS_ROOT_PidNamespaceForward.log, FetchAndStoreAndStoreAndFetch.log, > flaky-containerizer-pid-namespace-backward.txt, > flaky-containerizer-pid-namespace-forward.txt > > > Following the merging of a large patch chain, it was noticed on our internal > CI that several tests had become flaky, with a similar pattern in the > failures: the tests fail early when a future times out. Often, this occurs > when a test cluster is being spun up and one of the offer futures times out. > This has been observed in the following tests: > * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward > * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward > * ZooKeeperStateTest.FetchAndStoreAndStoreAndFetch > * RoleTest.ImplicitRoleRegister > * SlaveRecoveryTest/0.MultipleFrameworks > * SlaveRecoveryTest/0.ReconcileShutdownFramework > * SlaveTest.ContainerizerUsageFailure > * MesosSchedulerDriverTest.ExplicitAcknowledgements > * SlaveRecoveryTest/0.ReconnectHTTPExecutor (MESOS-6164) > * ResourceOffersTest.ResourcesGetReofferedAfterTaskInfoError (MESOS-6165) > * SlaveTest.CommandTaskWithKillPolicy (MESOS-6166) > See the linked JIRAs noted above for individual tickets addressing a couple > of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5821) Clean up the billions of compiler warnings on MSVC
[ https://issues.apache.org/jira/browse/MESOS-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497843#comment-15497843 ] Joseph Wu commented on MESOS-5821: -- These two brought the ASF Windows CI's warnings from {code} 4478 Warning(s) {code} to {code} 3671 Warning(s) {code} Admittedly, this is a "dirty" build (so not all the warnings show up), but good progress nonetheless: https://builds.apache.org/job/Mesos-Windows/523/ > Clean up the billions of compiler warnings on MSVC > -- > > Key: MESOS-5821 > URL: https://issues.apache.org/jira/browse/MESOS-5821 > Project: Mesos > Issue Type: Bug > Components: slave >Reporter: Alex Clemmer >Assignee: Daniel Pravat > Labels: mesosphere, slave > > Clean builds of Mesos on Windows will result in approximately {{5800 > Warning(s)}} or more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6127) Implement suppport for HTTP/2
[ https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497790#comment-15497790 ] Anand Mazumdar edited comment on MESOS-6127 at 9/17/16 12:18 AM: - I worked on trying to add HTTP2 support partially at this years MesosCon NA hackathon. You would still need a library that can understand/parse HTTP2 e.g., libnghttp2 etc. as you had pointed out earlier. The one which we would eventually choose should be ideally weighted upon in the design doc. Currently, Mesos uses the [http-parser|https://github.com/nodejs/http-parser] library which does not understand HTTP2 the last time I had a look at it. I would be happy to shepherd the work for putting the design document in place for sharing it with the community. was (Author: anandmazumdar): I tried to work on trying to add HTTP2 support partially at this years MesosCon NA hackathon. You would still need a library that can understand/parse HTTP2 e.g., libnghttp2 etc. as you had pointed out earlier. The one which we would eventually choose should be ideally weighted upon in the design doc. Currently, Mesos uses the [http-parser|https://github.com/nodejs/http-parser] library which does not understand HTTP2 the last time I had a look at it. I would be happy to shepherd the work for putting the design document in place for sharing it with the community. > Implement suppport for HTTP/2 > - > > Key: MESOS-6127 > URL: https://issues.apache.org/jira/browse/MESOS-6127 > Project: Mesos > Issue Type: Epic > Components: HTTP API, libprocess >Reporter: Aaron Wood > Labels: performance > > HTTP/2 will allow us to take advantage of connection multiplexing, header > compression, streams, server push, etc. Add support for communication over > HTTP/2 between masters and agents, framework endpoints, etc. > Should we support HTTP/2 without TLS? The spec allows for this but most major > browser vendors, libraries, and implementations aren't supporting it unless > TLS is used. If we do require TLS, what can be done to reduce the performance > hit of the TLS handshake? Might need to change more code to make sure that we > are taking advantage of connection sharing so that we can (ideally) only ever > have a one-time TLS handshake per shared connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6127) Implement suppport for HTTP/2
[ https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497790#comment-15497790 ] Anand Mazumdar commented on MESOS-6127: --- I tried to work on trying to add HTTP2 support partially at this years MesosCon NA hackathon. You would still need a library that can understand/parse HTTP2 e.g., libnghttp2 etc. as you had pointed out earlier. The one which we would eventually choose should be ideally be weighted upon in the design doc. Currently, Mesos uses the [http-parser|https://github.com/nodejs/http-parser] library which does not understand HTTP2 the last time I had a look at it. I would be happy to shepherd the work for putting the design document in place for sharing it with the community. > Implement suppport for HTTP/2 > - > > Key: MESOS-6127 > URL: https://issues.apache.org/jira/browse/MESOS-6127 > Project: Mesos > Issue Type: Epic > Components: HTTP API, libprocess >Reporter: Aaron Wood > Labels: performance > > HTTP/2 will allow us to take advantage of connection multiplexing, header > compression, streams, server push, etc. Add support for communication over > HTTP/2 between masters and agents, framework endpoints, etc. > Should we support HTTP/2 without TLS? The spec allows for this but most major > browser vendors, libraries, and implementations aren't supporting it unless > TLS is used. If we do require TLS, what can be done to reduce the performance > hit of the TLS handshake? Might need to change more code to make sure that we > are taking advantage of connection sharing so that we can (ideally) only ever > have a one-time TLS handshake per shared connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6127) Implement suppport for HTTP/2
[ https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497790#comment-15497790 ] Anand Mazumdar edited comment on MESOS-6127 at 9/17/16 12:15 AM: - I tried to work on trying to add HTTP2 support partially at this years MesosCon NA hackathon. You would still need a library that can understand/parse HTTP2 e.g., libnghttp2 etc. as you had pointed out earlier. The one which we would eventually choose should be ideally weighted upon in the design doc. Currently, Mesos uses the [http-parser|https://github.com/nodejs/http-parser] library which does not understand HTTP2 the last time I had a look at it. I would be happy to shepherd the work for putting the design document in place for sharing it with the community. was (Author: anandmazumdar): I tried to work on trying to add HTTP2 support partially at this years MesosCon NA hackathon. You would still need a library that can understand/parse HTTP2 e.g., libnghttp2 etc. as you had pointed out earlier. The one which we would eventually choose should be ideally be weighted upon in the design doc. Currently, Mesos uses the [http-parser|https://github.com/nodejs/http-parser] library which does not understand HTTP2 the last time I had a look at it. I would be happy to shepherd the work for putting the design document in place for sharing it with the community. > Implement suppport for HTTP/2 > - > > Key: MESOS-6127 > URL: https://issues.apache.org/jira/browse/MESOS-6127 > Project: Mesos > Issue Type: Epic > Components: HTTP API, libprocess >Reporter: Aaron Wood > Labels: performance > > HTTP/2 will allow us to take advantage of connection multiplexing, header > compression, streams, server push, etc. Add support for communication over > HTTP/2 between masters and agents, framework endpoints, etc. > Should we support HTTP/2 without TLS? The spec allows for this but most major > browser vendors, libraries, and implementations aren't supporting it unless > TLS is used. If we do require TLS, what can be done to reduce the performance > hit of the TLS handshake? Might need to change more code to make sure that we > are taking advantage of connection sharing so that we can (ideally) only ever > have a one-time TLS handshake per shared connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-5821) Clean up the billions of compiler warnings on MSVC
[ https://issues.apache.org/jira/browse/MESOS-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497785#comment-15497785 ] Joseph Wu edited comment on MESOS-5821 at 9/17/16 12:10 AM: {code} commit 862da54368841adf23be83e0eddd050b20733948 Author: Daniel Pravat Date: Fri Sep 16 17:03:32 2016 -0700 Windows: Disabled some deprecated function warnings. Visual Studio emits warnings for using deprecated functions in CRT and the use of insecure functions in CRT. This commit supresses the warning generation temporarily. Review: https://reviews.apache.org/r/51860/ {code} {code} commit 45fea210711c7a2e3c0e2cdacd5ca29d07453888 Author: Daniel Pravat Date: Fri Sep 16 17:06:16 2016 -0700 Windows: Removed macro redefinition. The `__STRINGIZE` macro is already defined in Visual Studio headers. Review: https://reviews.apache.org/r/51861/ {code} was (Author: kaysoky): {code} commit 862da54368841adf23be83e0eddd050b20733948 Author: Daniel Pravat Date: Fri Sep 16 17:03:32 2016 -0700 Windows: Disabled some deprecated function warnings. Visual Studio emits warnings for using deprecated functions in CRT and the use of insecure functions in CRT. This commit supresses the warning generation temporarily. Review: https://reviews.apache.org/r/51860/ {code} {code} commit 45fea210711c7a2e3c0e2cdacd5ca29d07453888 Author: Daniel Pravat Date: Fri Sep 16 17:06:16 2016 -0700 Windows: Removed macro redefinition. The `__STRINGIZE` macro is already defined in Visual Studio headers. Review: https://reviews.apache.org/r/51861/ [code} > Clean up the billions of compiler warnings on MSVC > -- > > Key: MESOS-5821 > URL: https://issues.apache.org/jira/browse/MESOS-5821 > Project: Mesos > Issue Type: Bug > Components: slave >Reporter: Alex Clemmer >Assignee: Daniel Pravat > Labels: mesosphere, slave > > Clean builds of Mesos on Windows will result in approximately {{5800 > Warning(s)}} or more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5821) Clean up the billions of compiler warnings on MSVC
[ https://issues.apache.org/jira/browse/MESOS-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497785#comment-15497785 ] Joseph Wu commented on MESOS-5821: -- {code} commit 862da54368841adf23be83e0eddd050b20733948 Author: Daniel Pravat Date: Fri Sep 16 17:03:32 2016 -0700 Windows: Disabled some deprecated function warnings. Visual Studio emits warnings for using deprecated functions in CRT and the use of insecure functions in CRT. This commit supresses the warning generation temporarily. Review: https://reviews.apache.org/r/51860/ {code} {code} commit 45fea210711c7a2e3c0e2cdacd5ca29d07453888 Author: Daniel Pravat Date: Fri Sep 16 17:06:16 2016 -0700 Windows: Removed macro redefinition. The `__STRINGIZE` macro is already defined in Visual Studio headers. Review: https://reviews.apache.org/r/51861/ [code} > Clean up the billions of compiler warnings on MSVC > -- > > Key: MESOS-5821 > URL: https://issues.apache.org/jira/browse/MESOS-5821 > Project: Mesos > Issue Type: Bug > Components: slave >Reporter: Alex Clemmer >Assignee: Daniel Pravat > Labels: mesosphere, slave > > Clean builds of Mesos on Windows will result in approximately {{5800 > Warning(s)}} or more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6127) Implement suppport for HTTP/2
[ https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497649#comment-15497649 ] Aaron Wood commented on MESOS-6127: --- Sounds good. If you think we should start with HTTP/2 and leave gRPC for later what are your thoughts on using libnghttp2_asio vs. implementing support directly into what exists in http.cpp and http.hpp? There seems to be a lot of custom implementation. > Implement suppport for HTTP/2 > - > > Key: MESOS-6127 > URL: https://issues.apache.org/jira/browse/MESOS-6127 > Project: Mesos > Issue Type: Epic > Components: HTTP API, libprocess >Reporter: Aaron Wood > Labels: performance > > HTTP/2 will allow us to take advantage of connection multiplexing, header > compression, streams, server push, etc. Add support for communication over > HTTP/2 between masters and agents, framework endpoints, etc. > Should we support HTTP/2 without TLS? The spec allows for this but most major > browser vendors, libraries, and implementations aren't supporting it unless > TLS is used. If we do require TLS, what can be done to reduce the performance > hit of the TLS handshake? Might need to change more code to make sure that we > are taking advantage of connection sharing so that we can (ideally) only ever > have a one-time TLS handshake per shared connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6127) Implement suppport for HTTP/2
[ https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497621#comment-15497621 ] Aaron Wood edited comment on MESOS-6127 at 9/16/16 10:53 PM: - Also, should these changes go upstream https://github.com/3rdparty/libprocess and/or directly in Mesos? was (Author: aaronjwood): Also, should these changes go upstream https://github.com/3rdparty/libprocess and/or directly in Mesos? [~vinodkone] if you think we should start with HTTP/2 and leave gRPC for later what are your thoughts on using libnghttp2_asio vs. implementing support directly into what exists in http.cpp and http.hpp? > Implement suppport for HTTP/2 > - > > Key: MESOS-6127 > URL: https://issues.apache.org/jira/browse/MESOS-6127 > Project: Mesos > Issue Type: Epic > Components: HTTP API, libprocess >Reporter: Aaron Wood > Labels: performance > > HTTP/2 will allow us to take advantage of connection multiplexing, header > compression, streams, server push, etc. Add support for communication over > HTTP/2 between masters and agents, framework endpoints, etc. > Should we support HTTP/2 without TLS? The spec allows for this but most major > browser vendors, libraries, and implementations aren't supporting it unless > TLS is used. If we do require TLS, what can be done to reduce the performance > hit of the TLS handshake? Might need to change more code to make sure that we > are taking advantage of connection sharing so that we can (ideally) only ever > have a one-time TLS handshake per shared connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-6127) Implement suppport for HTTP/2
[ https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497621#comment-15497621 ] Aaron Wood edited comment on MESOS-6127 at 9/16/16 10:53 PM: - Also, should these changes go upstream https://github.com/3rdparty/libprocess and/or directly in Mesos? [~vinodkone] if you think we should start with HTTP/2 and leave gRPC for later what are your thoughts on using libnghttp2_asio vs. implementing support directly into what exists in http.cpp and http.hpp? was (Author: aaronjwood): Also, should these changes go upstream https://github.com/3rdparty/libprocess and/or directly in Mesos? > Implement suppport for HTTP/2 > - > > Key: MESOS-6127 > URL: https://issues.apache.org/jira/browse/MESOS-6127 > Project: Mesos > Issue Type: Epic > Components: HTTP API, libprocess >Reporter: Aaron Wood > Labels: performance > > HTTP/2 will allow us to take advantage of connection multiplexing, header > compression, streams, server push, etc. Add support for communication over > HTTP/2 between masters and agents, framework endpoints, etc. > Should we support HTTP/2 without TLS? The spec allows for this but most major > browser vendors, libraries, and implementations aren't supporting it unless > TLS is used. If we do require TLS, what can be done to reduce the performance > hit of the TLS handshake? Might need to change more code to make sure that we > are taking advantage of connection sharing so that we can (ideally) only ever > have a one-time TLS handshake per shared connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6181) The logic for BadACLNoPrincipal and BadACLDropCreateAndDestroy is not correct
[ https://issues.apache.org/jira/browse/MESOS-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497632#comment-15497632 ] Guangya Liu commented on MESOS-6181: cc [~greggomann] > The logic for BadACLNoPrincipal and BadACLDropCreateAndDestroy is not correct > - > > Key: MESOS-6181 > URL: https://issues.apache.org/jira/browse/MESOS-6181 > Project: Mesos > Issue Type: Bug >Reporter: Guangya Liu >Assignee: Guangya Liu > > Two issues for those two test cases: > 1) No need to add `{}` in the test case as there is no need to add `{}`, > adding the `{}` will cause the driver decline a non exist offer. > 2) If destroy volume failed, we should get the last offer to make sure that > the last offer also contain the volume resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6127) Implement suppport for HTTP/2
[ https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497625#comment-15497625 ] Vinod Kone commented on MESOS-6127: --- Directly to Mesos. The 3rdparty repo is not up-to-date. > Implement suppport for HTTP/2 > - > > Key: MESOS-6127 > URL: https://issues.apache.org/jira/browse/MESOS-6127 > Project: Mesos > Issue Type: Epic > Components: HTTP API, libprocess >Reporter: Aaron Wood > Labels: performance > > HTTP/2 will allow us to take advantage of connection multiplexing, header > compression, streams, server push, etc. Add support for communication over > HTTP/2 between masters and agents, framework endpoints, etc. > Should we support HTTP/2 without TLS? The spec allows for this but most major > browser vendors, libraries, and implementations aren't supporting it unless > TLS is used. If we do require TLS, what can be done to reduce the performance > hit of the TLS handshake? Might need to change more code to make sure that we > are taking advantage of connection sharing so that we can (ideally) only ever > have a one-time TLS handshake per shared connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6127) Implement suppport for HTTP/2
[ https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497624#comment-15497624 ] Vinod Kone commented on MESOS-6127: --- There is no standard format, but you can look at few examples: https://github.com/apache/mesos/blob/6f970a7badacf16953ebbc2c72c6ae7eb5e662e2/docs/design-docs.md > Implement suppport for HTTP/2 > - > > Key: MESOS-6127 > URL: https://issues.apache.org/jira/browse/MESOS-6127 > Project: Mesos > Issue Type: Epic > Components: HTTP API, libprocess >Reporter: Aaron Wood > Labels: performance > > HTTP/2 will allow us to take advantage of connection multiplexing, header > compression, streams, server push, etc. Add support for communication over > HTTP/2 between masters and agents, framework endpoints, etc. > Should we support HTTP/2 without TLS? The spec allows for this but most major > browser vendors, libraries, and implementations aren't supporting it unless > TLS is used. If we do require TLS, what can be done to reduce the performance > hit of the TLS handshake? Might need to change more code to make sure that we > are taking advantage of connection sharing so that we can (ideally) only ever > have a one-time TLS handshake per shared connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6127) Implement suppport for HTTP/2
[ https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497621#comment-15497621 ] Aaron Wood commented on MESOS-6127: --- Also, should these changes go upstream https://github.com/3rdparty/libprocess and/or directly in Mesos? > Implement suppport for HTTP/2 > - > > Key: MESOS-6127 > URL: https://issues.apache.org/jira/browse/MESOS-6127 > Project: Mesos > Issue Type: Epic > Components: HTTP API, libprocess >Reporter: Aaron Wood > Labels: performance > > HTTP/2 will allow us to take advantage of connection multiplexing, header > compression, streams, server push, etc. Add support for communication over > HTTP/2 between masters and agents, framework endpoints, etc. > Should we support HTTP/2 without TLS? The spec allows for this but most major > browser vendors, libraries, and implementations aren't supporting it unless > TLS is used. If we do require TLS, what can be done to reduce the performance > hit of the TLS handshake? Might need to change more code to make sure that we > are taking advantage of connection sharing so that we can (ideally) only ever > have a one-time TLS handshake per shared connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6127) Implement suppport for HTTP/2
[ https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497593#comment-15497593 ] Aaron Wood commented on MESOS-6127: --- Yes, definitely. Is there a defined design document process in place for Apache/Mesos? > Implement suppport for HTTP/2 > - > > Key: MESOS-6127 > URL: https://issues.apache.org/jira/browse/MESOS-6127 > Project: Mesos > Issue Type: Epic > Components: HTTP API, libprocess >Reporter: Aaron Wood > Labels: performance > > HTTP/2 will allow us to take advantage of connection multiplexing, header > compression, streams, server push, etc. Add support for communication over > HTTP/2 between masters and agents, framework endpoints, etc. > Should we support HTTP/2 without TLS? The spec allows for this but most major > browser vendors, libraries, and implementations aren't supporting it unless > TLS is used. If we do require TLS, what can be done to reduce the performance > hit of the TLS handshake? Might need to change more code to make sure that we > are taking advantage of connection sharing so that we can (ideally) only ever > have a one-time TLS handshake per shared connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6189) Add a virtual method to Isolator to indicate if it supports nesting.
Jie Yu created MESOS-6189: - Summary: Add a virtual method to Isolator to indicate if it supports nesting. Key: MESOS-6189 URL: https://issues.apache.org/jira/browse/MESOS-6189 Project: Mesos Issue Type: Task Reporter: Jie Yu Assignee: Jie Yu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6127) Implement suppport for HTTP/2
[ https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497438#comment-15497438 ] Vinod Kone commented on MESOS-6127: --- It is not trivial to update libprocess (the internal communication library that mesos uses) to speak gRPC. It might be relatively easier to add support for HTTP/2 into libprocess. Note that libprocess already supports SSL/TLS when used in conjunction with libevent. Either way, this needs an extensive design doc. [~aaronjwood] Is this something you are interested in working on? > Implement suppport for HTTP/2 > - > > Key: MESOS-6127 > URL: https://issues.apache.org/jira/browse/MESOS-6127 > Project: Mesos > Issue Type: Epic > Components: HTTP API, libprocess >Reporter: Aaron Wood > Labels: performance > > HTTP/2 will allow us to take advantage of connection multiplexing, header > compression, streams, server push, etc. Add support for communication over > HTTP/2 between masters and agents, framework endpoints, etc. > Should we support HTTP/2 without TLS? The spec allows for this but most major > browser vendors, libraries, and implementations aren't supporting it unless > TLS is used. If we do require TLS, what can be done to reduce the performance > hit of the TLS handshake? Might need to change more code to make sure that we > are taking advantage of connection sharing so that we can (ideally) only ever > have a one-time TLS handshake per shared connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6183) mesos's Unified Containerizer cannot set "--insecure-registry" when provisioning images
[ https://issues.apache.org/jira/browse/MESOS-6183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-6183: - Flags: (was: Important) Priority: Minor (was: Major) The Mesos containerizer doesn't have an option/flag for insecure registries. What task are you launching? > mesos's Unified Containerizer cannot set "--insecure-registry" when > provisioning images > --- > > Key: MESOS-6183 > URL: https://issues.apache.org/jira/browse/MESOS-6183 > Project: Mesos > Issue Type: Bug >Reporter: yongyu >Priority: Minor > > mesos's Unified Containerizer cannot set "--insecure-registry" when > provisioning images -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6188) Make the `gpu/nvidia` isolator nesting aware
Kevin Klues created MESOS-6188: -- Summary: Make the `gpu/nvidia` isolator nesting aware Key: MESOS-6188 URL: https://issues.apache.org/jira/browse/MESOS-6188 Project: Mesos Issue Type: Task Reporter: Kevin Klues Assignee: Kevin Klues Fix For: 1.1.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6187) "double free or corruption" with Java 8
Neil Conway created MESOS-6187: -- Summary: "double free or corruption" with Java 8 Key: MESOS-6187 URL: https://issues.apache.org/jira/browse/MESOS-6187 Project: Mesos Issue Type: Bug Environment: Linux archlinux.vagrant.vm 4.7.2-1-ARCH #1 SMP PREEMPT Sat Aug 20 23:02:56 CEST 2016 x86_64 GNU/Linux Reporter: Neil Conway Observed running the unit tests on recent Arch Linux. Haven't repro'd yet; took ~17 iterations of the entire test suite (~2 hours of runtime) before it occurred. {{noformat}} % ./src/mesos-tests --gtest_repeat=30 >>& ~/test_log.txt *** Error in `/usr/lib/jvm/java-8-openjdk/bin/java': double free or corruption (fasttop): 0x7f102c00c760 *** === Backtrace: = /usr/lib/libc.so.6(+0x70c4b)[0x7f1097f25c4b] /usr/lib/libc.so.6(+0x76fe6)[0x7f1097f2bfe6] /usr/lib/libc.so.6(+0x777de)[0x7f1097f2c7de] /home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt14_Function_base13_Base_managerIZN7process5deferIN5mesos8internal5slave5SlaveEiiSt12_PlaceholderILi1EES7_ILi2NS1_9_DeferredIDTcl4bindadsrSt8functionIFvT0_T1_EEclcvSF__Efp1_fp2_RKNS1_3PIDIT_EEMSJ_FvSC_SD_ET2_T3_EUliiE_E10_M_destroyERSt9_Any_dataSt17integral_constantIbLb0EE+0x31)[0x7f1035f838b1] /home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt14_Function_base13_Base_managerIZN7process5deferIN5mesos8internal5slave5SlaveEiiSt12_PlaceholderILi1EES7_ILi2NS1_9_DeferredIDTcl4bindadsrSt8functionIFvT0_T1_EEclcvSF__Efp1_fp2_RKNS1_3PIDIT_EEMSJ_FvSC_SD_ET2_T3_EUliiE_E10_M_managerERSt9_Any_dataRKST_St18_Manager_operation+0x9e)[0x7f1035f70598] /home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt14_Function_baseD1Ev+0x33)[0x7f10357576f7] /home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt8functionIFviiEED1Ev+0x18)[0x7f1035f2e4b6] /home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt10_Head_baseILm0ESt8functionIFviiEELb0EED1Ev+0x18)[0x7f1035f2f5ce] /home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt11_Tuple_implILm0EJSt8functionIFviiEESt12_PlaceholderILi1EES3_ILi2D1Ev+0x18)[0x7f1035f2f5ea] /home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt5tupleIJSt8functionIFviiEESt12_PlaceholderILi1EES3_ILi2D1Ev+0x18)[0x7f1035f2f606] /home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt5_BindIFSt7_Mem_fnIMSt8functionIFviiEEKFviiEES3_St12_PlaceholderILi1EES7_ILi2D1Ev+0x1c)[0x7f1035f2f626] /home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt14_Function_base13_Base_managerISt5_BindIFSt7_Mem_fnIMSt8functionIFviiEEKFviiEES5_St12_PlaceholderILi1EES9_ILi2E10_M_destroyERSt9_Any_dataSt17integral_constantIbLb0EE+0x29)[0x7f1035f83c31] /home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt14_Function_base13_Base_managerISt5_BindIFSt7_Mem_fnIMSt8functionIFviiEEKFviiEES5_St12_PlaceholderILi1EES9_ILi2E10_M_managerERSt9_Any_dataRKSF_St18_Manager_operation+0x9e)[0x7f1035f70857] /home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt14_Function_baseD1Ev+0x33)[0x7f10357576f7] /home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt8functionIFviiEED1Ev+0x18)[0x7f1035f2e4b6] /home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZN2os8internal15configureSignalEPKSt8functionIFviiEE+0x3b)[0x7f1035eebbe7] /home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZN5mesos8internal5slave5Slave10initializeEv+0x44af)[0x7f1035ef15c5] /home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZN7process14ProcessManager6resumeEPNS_11ProcessBaseE+0x283)[0x7f1036b4552f] /home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(+0x4d86450)[0x7f1036b42450] /home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(+0x4d9450e)[0x7f1036b5050e] /home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(+0x4d944ab)[0x7f1036b504ab] /home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(+0x4d9448a)[0x7f1036b5048a] /usr/lib/libstdc++.so.6(+0xbb31f)[0x7f106eb1031f] /usr/lib/libpthread.so.0(+0x7454)[0x7f1098881454] /usr/lib/libc.so.6(clone+0x5f)[0x7f1097f9d7df] === Memory map: 0040-00401000 r-xp 08:01 183769 /usr/lib/jvm/java-8-openjdk/jre/bin/java 0060-00601000 rw-p 08:01 183769 /usr/lib/jvm/java-8-openjdk/jre/bin/java 022b-022d1000 rw-p 00:00 0 [heap] 8320-8860 rw-p 00:00 0 8860-d660 ---p 00:00 0 d660-d900 rw-p 00:00 0 d900-1 ---p 00:00 0 1-1000a rw-p 00:00 0 1000a-14000 ---p 00:00 0 7f100400-7f1004026000 rw-p 00:00 0 7f1004026000-7f100800 ---p 00:00 0 7f100800-7f10080210
[jira] [Created] (MESOS-6186) Make the generic `cgroups` isolator nesting aware
Kevin Klues created MESOS-6186: -- Summary: Make the generic `cgroups` isolator nesting aware Key: MESOS-6186 URL: https://issues.apache.org/jira/browse/MESOS-6186 Project: Mesos Issue Type: Task Reporter: Kevin Klues Assignee: Kevin Klues Fix For: 1.1.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6145) Isolator namespaces/pid is leaking mounts
[ https://issues.apache.org/jira/browse/MESOS-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497064#comment-15497064 ] Jie Yu commented on MESOS-6145: --- Simplified the isolator: https://reviews.apache.org/r/51963/ The bind mounts in the pid namespace isolator turns out to be unnecessary as the linux launcher will use freezer to kill all tasks anyway. It makes the isolator unnecessarily complex, and has a mount leak bug (MESOS-6145). This patch removes all the unnecessary bind mounts, making the isolator extremely simple. > Isolator namespaces/pid is leaking mounts > - > > Key: MESOS-6145 > URL: https://issues.apache.org/jira/browse/MESOS-6145 > Project: Mesos > Issue Type: Bug > Components: containerization, isolation, security >Reporter: Stephan Erb >Assignee: Jie Yu > > As the operator of a Mesos cluster, I would like every container/executor to > run in a single PID namespace, so that a task cannot see what else is running > on the same host. > The existing {{namespaces/pid}} isolator seems to provide this feature. > However, it seems like it is leaking files. I have exactly one task running > currently, but there are still left overs from earlier invocations > {code} > vagrant@aurora:~/aurora$ ls -l /var/run/mesos/pidns/ > total 0 > -rw-r--r-- 1 root root 0 Aug 26 20:30 32b6e4c7-3d22-47ed-a350-9eb929daa241 > -rw-r--r-- 1 root root 0 Aug 26 20:30 7b812f00-4614-4016-a76c-ff78a175a1b0 > -rw-r--r-- 1 root root 0 Aug 26 20:24 d501829e-7cf8-40fb-a895-0ad3416da7dc > -rw-r--r-- 1 root root 0 Aug 26 20:24 d56ca91f-eb72-426c-8bbb-f3239358a4ef > -r--r--r-- 1 root root 0 Aug 26 20:35 fef9a109-de52-45f3-ae41-171de6495705 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6180) Several tests are flaky, with futures timing out early
[ https://issues.apache.org/jira/browse/MESOS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496966#comment-15496966 ] Vinod Kone commented on MESOS-6180: --- Looking at `CGROUPS_ROOT_PidNamespaceForward` the TASK_LOST is expected because the test doesn't wait for TASK_RUNNING update before terminating the agent. {quote} Future registerExecutorMessage = FUTURE_MESSAGE(Eq(RegisterExecutorMessage().GetTypeName()), _, _); driver.launchTasks(offers1.get()[0].id(), {task1}); AWAIT_READY(registerExecutorMessage); Future> containers = containerizer->containers(); AWAIT_READY(containers); EXPECT_EQ(1u, containers.get().size()); ContainerID containerId = *(containers.get().begin()); // Stop the slave. slave.get()->terminate(); {quote} > Several tests are flaky, with futures timing out early > -- > > Key: MESOS-6180 > URL: https://issues.apache.org/jira/browse/MESOS-6180 > Project: Mesos > Issue Type: Bug > Components: tests >Reporter: Greg Mann >Assignee: haosdent > Labels: mesosphere, tests > Attachments: CGROUPS_ROOT_PidNamespaceBackward.log, > CGROUPS_ROOT_PidNamespaceForward.log, FetchAndStoreAndStoreAndFetch.log > > > Following the merging of a large patch chain, it was noticed on our internal > CI that several tests had become flaky, with a similar pattern in the > failures: the tests fail early when a future times out. Often, this occurs > when a test cluster is being spun up and one of the offer futures times out. > This has been observed in the following tests: > * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward > * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward > * ZooKeeperStateTest.FetchAndStoreAndStoreAndFetch > * RoleTest.ImplicitRoleRegister > * SlaveRecoveryTest/0.MultipleFrameworks > * SlaveRecoveryTest/0.ReconcileShutdownFramework > * SlaveTest.ContainerizerUsageFailure > * MesosSchedulerDriverTest.ExplicitAcknowledgements > * SlaveRecoveryTest/0.ReconnectHTTPExecutor (MESOS-6164) > * ResourceOffersTest.ResourcesGetReofferedAfterTaskInfoError (MESOS-6165) > * SlaveTest.CommandTaskWithKillPolicy (MESOS-6166) > See the linked JIRAs noted above for individual tickets addressing a couple > of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6180) Several tests are flaky, with futures timing out early
[ https://issues.apache.org/jira/browse/MESOS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496724#comment-15496724 ] haosdent commented on MESOS-6180: - Yep, the order of the log you mentioned is correct as well. Let's split it to stdout and stderr. {code:title=grep -v 'W:' (stdout)|borderStyle=solid} [02:57:42] : [Step 10/10] [ RUN ] SlaveRecoveryTest/0.ReconnectHTTPExecutor [02:57:43] : [Step 10/10] Received SUBSCRIBED event [02:57:43] : [Step 10/10] Subscribed executor on ip-172-30-2-23.mesosphere.io [02:57:43] : [Step 10/10] Received LAUNCH event [02:57:43] : [Step 10/10] Starting task c1ba3f0b-2f6a-46a1-b752-592394c6d726 [02:57:43] : [Step 10/10] /mnt/teamcity/work/4240ba9ddd0997c3/build/src/mesos-containerizer launch --command="{"shell":true,"value":"sleep 1000"}" --help="false" --unshare_namespace_mnt="false" [02:57:43] : [Step 10/10] Forked command at 4653 [02:57:43] : [Step 10/10] Received ERROR event [02:57:43] : [Step 10/10] Received ERROR event [02:57:58] : [Step 10/10] ../../src/tests/slave_recovery_tests.cpp:510: Failure [02:57:58] : [Step 10/10] Failed to wait 15secs for status [02:57:58] : [Step 10/10] ../../src/tests/slave_recovery_tests.cpp:491: Failure [02:57:58] : [Step 10/10] Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(_, _))... [02:57:58] : [Step 10/10] Expected: to be called at least once [02:57:58] : [Step 10/10]Actual: never called - unsatisfied and active [02:58:13] : [Step 10/10] ../../src/tests/cluster.cpp:560: Failure [02:58:13] : [Step 10/10] Failed to wait 15secs for wait [02:59:18] : [Step 10/10] [ FAILED ] SlaveRecoveryTest/0.ReconnectHTTPExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer (95963 ms) {code} {code:title=grep 'W:' (stdout - SlaveRecoveryTest/0.RecoverStatusUpdateManager)|borderStyle=solid} [02:59:18]W: [Step 10/10] I0915 02:57:42.726838 24222 hierarchical.cpp:1770] No inverse offers to send out! [02:59:18]W: [Step 10/10] I0915 02:57:42.726851 24222 hierarchical.cpp:1271] Performed allocation for 1 agents in 80513ns [02:59:18]W: [Step 10/10] I0915 02:57:42.929819 24218 slave.cpp:3521] Cleaning up un-reregistered executors [02:59:18]W: [Step 10/10] I0915 02:57:42.929872 24218 slave.cpp:5197] Finished recovery [02:59:18]W: [Step 10/10] I0915 02:57:42.930137 24218 slave.cpp:5369] Querying resource estimator for oversubscribable resources [02:59:18]W: [Step 10/10] I0915 02:57:42.930229 24220 slave.cpp:5383] Received oversubscribable resources from the resource estimator [02:59:18]W: [Step 10/10] I0915 02:57:42.930289 24220 slave.cpp:911] New master detected at master@172.30.2.23:32968 [02:59:18]W: [Step 10/10] I0915 02:57:42.930301 24220 slave.cpp:970] Authenticating with master master@172.30.2.23:32968 [02:59:18]W: [Step 10/10] I0915 02:57:42.930315 24220 slave.cpp:981] Using default CRAM-MD5 authenticatee [02:59:18]W: [Step 10/10] I0915 02:57:42.930336 24217 status_update_manager.cpp:177] Pausing sending status updates [02:59:18]W: [Step 10/10] I0915 02:57:42.930364 24220 slave.cpp:943] Detecting new master [02:59:18]W: [Step 10/10] I0915 02:57:42.930382 24217 authenticatee.cpp:121] Creating new client SASL connection [02:59:18]W: [Step 10/10] I0915 02:57:42.930631 24216 master.cpp:6234] Authenticating slave(353)@172.30.2.23:32968 [02:59:18]W: [Step 10/10] I0915 02:57:42.930697 24219 authenticator.cpp:414] Starting authentication session for crammd5-authenticatee(755)@172.30.2.23:32968 [02:59:18]W: [Step 10/10] I0915 02:57:42.930804 24218 authenticator.cpp:98] Creating new server SASL connection [02:59:18]W: [Step 10/10] I0915 02:57:42.930964 24218 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5 [02:59:18]W: [Step 10/10] I0915 02:57:42.930977 24218 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5' [02:59:18]W: [Step 10/10] I0915 02:57:42.931010 24218 authenticator.cpp:204] Received SASL authentication start [02:59:18]W: [Step 10/10] I0915 02:57:42.931037 24218 authenticator.cpp:326] Authentication requires more steps [02:59:18]W: [Step 10/10] I0915 02:57:42.931064 24218 authenticatee.cpp:259] Received SASL authentication step [02:59:18]W: [Step 10/10] I0915 02:57:42.931098 24218 authenticator.cpp:232] Received SASL authentication step [02:59:18]W: [Step 10/10] I0915 02:57:42.931109 24218 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: 'ip-172-30-2-23.mesosphere.io' server FQDN: 'ip-172-30-2-23.mesosphere.io' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false [02:59:18]W: [Step 10/10] I0915 02:57:42.931114 24218 auxprop.cpp:181] Looking up auxiliary property '*userPassw
[jira] [Updated] (MESOS-4431) Sharing of persistent volumes via reference counting
[ https://issues.apache.org/jira/browse/MESOS-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-4431: -- Affects Version/s: (was: 0.25.0) > Sharing of persistent volumes via reference counting > > > Key: MESOS-4431 > URL: https://issues.apache.org/jira/browse/MESOS-4431 > Project: Mesos > Issue Type: Improvement > Components: general >Reporter: Anindya Sinha >Assignee: Anindya Sinha > Labels: persistent-volumes > Fix For: 1.1.0 > > > Add capability for specific resources to be shared amongst tasks within or > across frameworks/roles. Enable this functionality for persistent volumes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4431) Sharing of persistent volumes via reference counting
[ https://issues.apache.org/jira/browse/MESOS-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-4431: -- Fix Version/s: 1.1.0 > Sharing of persistent volumes via reference counting > > > Key: MESOS-4431 > URL: https://issues.apache.org/jira/browse/MESOS-4431 > Project: Mesos > Issue Type: Improvement > Components: general >Reporter: Anindya Sinha >Assignee: Anindya Sinha > Labels: persistent-volumes > Fix For: 1.1.0 > > > Add capability for specific resources to be shared amongst tasks within or > across frameworks/roles. Enable this functionality for persistent volumes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4325) Offer shareable resources to frameworks only if it is opted in
[ https://issues.apache.org/jira/browse/MESOS-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-4325: -- Affects Version/s: (was: 0.25.0) > Offer shareable resources to frameworks only if it is opted in > -- > > Key: MESOS-4325 > URL: https://issues.apache.org/jira/browse/MESOS-4325 > Project: Mesos > Issue Type: Improvement > Components: general >Reporter: Anindya Sinha >Assignee: Anindya Sinha >Priority: Minor > Labels: external-volumes, persistent-volumes > Fix For: 1.1.0 > > > Added a new capability SHAREABLE_RESOURCES that frameworks need to opt in if > they are interested in receiving shared resources in their offers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6180) Several tests are flaky, with futures timing out early
[ https://issues.apache.org/jira/browse/MESOS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496662#comment-15496662 ] Greg Mann commented on MESOS-6180: -- Thanks for the patches, [~haosd...@gmail.com]!! I'll review and do some testing this morning. Regarding the interleaving: for example, in the log posted in MESOS-6164 we find the line: {code} Checkpointing framework pid 'scheduler-26d5bb2d-7233-4725-9755-169f84aee769@172.30.2.23:32968' to '/mnt/teamcity/temp/buildTmp/SlaveRecoveryTest_0_RecoverStatusUpdateManager_w0ToCt/meta/slaves/d22b6309-24c3-422f-a501-a672e7c3e046-S0/frameworks/d22b6309-24c3-422f-a501-a672e7c3e046-/framework.pid' {code} which indicates that this output can be attributed to {{SlaveRecoveryTest.RecoverStatusUpdateManager}}. I think {{SlaveRecoveryTest.ReconnectHTTPExecutor}} begins much later with the line: {{I0915 02:57:42.981866 24202 cluster.cpp:157] Creating default 'local' authorizer}}. > Several tests are flaky, with futures timing out early > -- > > Key: MESOS-6180 > URL: https://issues.apache.org/jira/browse/MESOS-6180 > Project: Mesos > Issue Type: Bug > Components: tests >Reporter: Greg Mann >Assignee: haosdent > Labels: mesosphere, tests > Attachments: CGROUPS_ROOT_PidNamespaceBackward.log, > CGROUPS_ROOT_PidNamespaceForward.log, FetchAndStoreAndStoreAndFetch.log > > > Following the merging of a large patch chain, it was noticed on our internal > CI that several tests had become flaky, with a similar pattern in the > failures: the tests fail early when a future times out. Often, this occurs > when a test cluster is being spun up and one of the offer futures times out. > This has been observed in the following tests: > * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward > * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward > * ZooKeeperStateTest.FetchAndStoreAndStoreAndFetch > * RoleTest.ImplicitRoleRegister > * SlaveRecoveryTest/0.MultipleFrameworks > * SlaveRecoveryTest/0.ReconcileShutdownFramework > * SlaveTest.ContainerizerUsageFailure > * MesosSchedulerDriverTest.ExplicitAcknowledgements > * SlaveRecoveryTest/0.ReconnectHTTPExecutor (MESOS-6164) > * ResourceOffersTest.ResourcesGetReofferedAfterTaskInfoError (MESOS-6165) > * SlaveTest.CommandTaskWithKillPolicy (MESOS-6166) > See the linked JIRAs noted above for individual tickets addressing a couple > of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6185) Improve test coverage for shared persistent volumes.
Yan Xu created MESOS-6185: - Summary: Improve test coverage for shared persistent volumes. Key: MESOS-6185 URL: https://issues.apache.org/jira/browse/MESOS-6185 Project: Mesos Issue Type: Task Affects Versions: 1.1.0 Reporter: Yan Xu Assignee: Anindya Sinha In addition to tests in MESOS-4431 we need to improve coverage on new code paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4325) Offer shareable resources to frameworks only if it is opted in
[ https://issues.apache.org/jira/browse/MESOS-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-4325: -- Fix Version/s: 1.1.0 > Offer shareable resources to frameworks only if it is opted in > -- > > Key: MESOS-4325 > URL: https://issues.apache.org/jira/browse/MESOS-4325 > Project: Mesos > Issue Type: Improvement > Components: general >Reporter: Anindya Sinha >Assignee: Anindya Sinha >Priority: Minor > Labels: external-volumes, persistent-volumes > Fix For: 1.1.0 > > > Added a new capability SHAREABLE_RESOURCES that frameworks need to opt in if > they are interested in receiving shared resources in their offers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4431) Support sharing of persistent volumes via shared resources.
[ https://issues.apache.org/jira/browse/MESOS-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-4431: -- Summary: Support sharing of persistent volumes via shared resources. (was: Sharing of persistent volumes via reference counting) > Support sharing of persistent volumes via shared resources. > --- > > Key: MESOS-4431 > URL: https://issues.apache.org/jira/browse/MESOS-4431 > Project: Mesos > Issue Type: Improvement > Components: general >Reporter: Anindya Sinha >Assignee: Anindya Sinha > Labels: persistent-volumes > Fix For: 1.1.0 > > > Add capability for specific resources to be shared amongst tasks within or > across frameworks/roles. Enable this functionality for persistent volumes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4431) Sharing of persistent volumes via reference counting
[ https://issues.apache.org/jira/browse/MESOS-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-4431: -- Labels: persistent-volumes (was: external-volumes persistent-volumes) > Sharing of persistent volumes via reference counting > > > Key: MESOS-4431 > URL: https://issues.apache.org/jira/browse/MESOS-4431 > Project: Mesos > Issue Type: Improvement > Components: general >Reporter: Anindya Sinha >Assignee: Anindya Sinha > Labels: persistent-volumes > Fix For: 1.1.0 > > > Add capability for specific resources to be shared amongst tasks within or > across frameworks/roles. Enable this functionality for persistent volumes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4325) Offer shareable resources to frameworks only if it is opted in
[ https://issues.apache.org/jira/browse/MESOS-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-4325: -- Labels: persistent-volumes (was: external-volumes persistent-volumes) > Offer shareable resources to frameworks only if it is opted in > -- > > Key: MESOS-4325 > URL: https://issues.apache.org/jira/browse/MESOS-4325 > Project: Mesos > Issue Type: Improvement > Components: general >Reporter: Anindya Sinha >Assignee: Anindya Sinha >Priority: Minor > Labels: persistent-volumes > Fix For: 1.1.0 > > > Added a new capability SHAREABLE_RESOURCES that frameworks need to opt in if > they are interested in receiving shared resources in their offers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6184) Change health check to use childHooks to enter the namespaces of the container
haosdent created MESOS-6184: --- Summary: Change health check to use childHooks to enter the namespaces of the container Key: MESOS-6184 URL: https://issues.apache.org/jira/browse/MESOS-6184 Project: Mesos Issue Type: Improvement Reporter: haosdent Assignee: haosdent To perform health checks for tasks, we need to enter the corresponding namespaces of the container. For now health check use custom clone to implement this {code} return process::defaultClone([=]() -> int { if (taskPid.isSome()) { foreach (const string& ns, namespaces) { Try setns = ns::setns(taskPid.get(), ns); if (setns.isError()) { ... } } } return func(); }); {code} After the childHooks patches merged, we could change the health check to use childHooks to call {{setns}} and make {{process::defaultClone}} private again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6183) mesos's Unified Containerizer cannot set "--insecure-registry" when provisioning images
[ https://issues.apache.org/jira/browse/MESOS-6183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496498#comment-15496498 ] yongyu commented on MESOS-6183: --- I get the error: Subscribed with ID 'fd64c22d-cca5-4a65-b275-fef05fc63730-0920' Submitted task 'test_mesos' to agent 'fd64c22d-cca5-4a65-b275-fef05fc63730-S10' Received status update TASK_FAILED for task 'test_mesos' message: 'Failed to launch container: Failed to perform 'curl': curl: (35) SSL received a record that exceeded the maximum permissible length. ; Container destroyed while provisioning images' source: SOURCE_AGENT reason: REASON_CONTAINER_LAUNCH_FAILED > mesos's Unified Containerizer cannot set "--insecure-registry" when > provisioning images > --- > > Key: MESOS-6183 > URL: https://issues.apache.org/jira/browse/MESOS-6183 > Project: Mesos > Issue Type: Bug >Reporter: yongyu > > mesos's Unified Containerizer cannot set "--insecure-registry" when > provisioning images -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6183) mesos's Unified Containerizer cannot set "--insecure-registry" when provisioning images
yongyu created MESOS-6183: - Summary: mesos's Unified Containerizer cannot set "--insecure-registry" when provisioning images Key: MESOS-6183 URL: https://issues.apache.org/jira/browse/MESOS-6183 Project: Mesos Issue Type: Bug Reporter: yongyu mesos's Unified Containerizer cannot set "--insecure-registry" when provisioning images -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6182) LinuxRootfs::create ignores failures from adding non-existing files
[ https://issues.apache.org/jira/browse/MESOS-6182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496439#comment-15496439 ] Benjamin Bannier commented on MESOS-6182: - That is exactly what I tried, but since even on supported platforms like ubuntu-14 we try to add non-existant making this case fatal requires fixing the fallout (for some reason the existing test suite seems to run successfully even though some files are missing from the rootfs though :/). Maybe [~gilbert] knowns if we can throw out stuff we don't need; otherwise we'd need to wait for the proper solution for MESOS-6011. > LinuxRootfs::create ignores failures from adding non-existing files > --- > > Key: MESOS-6182 > URL: https://issues.apache.org/jira/browse/MESOS-6182 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier > > {{LinuxRootfs::create}} attempts to add a hardcoded list of files to the > created rootfs. However, if a file does not exist no failure is created, but > the file will be missing from the rootfs. > This can then lead to failures in tests using the rootfs and relying on files > in it. > We should make failures to compose the planned rootfs explicit so users of > this test code know what they can rely on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6182) LinuxRootfs::create ignores failures from adding non-existing files
[ https://issues.apache.org/jira/browse/MESOS-6182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496363#comment-15496363 ] Alexander Rukletsov commented on MESOS-6182: As a first thing, can we make {{LinuxRootfs::create()}} return an error in case {{os::realpath()}} returns {{None()}}, which indicates the file can't be found? > LinuxRootfs::create ignores failures from adding non-existing files > --- > > Key: MESOS-6182 > URL: https://issues.apache.org/jira/browse/MESOS-6182 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier > > {{LinuxRootfs::create}} attempts to add a hardcoded list of files to the > created rootfs. However, if a file does not exist no failure is created, but > the file will be missing from the rootfs. > This can then lead to failures in tests using the rootfs and relying on files > in it. > We should make failures to compose the planned rootfs explicit so users of > this test code know what they can rely on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6182) LinuxRootfs::create ignores failures from adding non-existing files
[ https://issues.apache.org/jira/browse/MESOS-6182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496333#comment-15496333 ] Benjamin Bannier commented on MESOS-6182: - Making failure to copy a file fatal in {{LinuxRootfs::create}} is technically not hard (just check all three states of the {{Result}} of {{realpath}}), but it looks like already the existing list of files to copy contains files which do not exists even on core supported distributions. On an ubuntu-14 image I get {code} for f in /bin/echo /usr/bin/bash /bin/ls /bin/sh /bin/sleep /lib/x86_64-linux-gnu /lib64/ld-linux-x86-64.so.2 /lib64/libc.so.6 /lib64/libdl.so.2 /lib64/libtinfo.so.5 /lib64/libselinux.so.1 /lib64/libpcre.so.1 /lib64/liblzma.so.5 /lib64/libpthread.so.0 /lib64/libcap.so.2 /lib64/libacl.so.1 /lib64/libattr.so.1 /lib64/librt.so.1 /etc/passwd; do ls -d $f; done /bin/echo ls: cannot access /usr/bin/bash: No such file or directory /bin/ls /bin/sh /bin/sleep /lib/x86_64-linux-gnu /lib64/ld-linux-x86-64.so.2 ls: cannot access /lib64/libc.so.6: No such file or directory ls: cannot access /lib64/libdl.so.2: No such file or directory ls: cannot access /lib64/libtinfo.so.5: No such file or directory ls: cannot access /lib64/libselinux.so.1: No such file or directory ls: cannot access /lib64/libpcre.so.1: No such file or directory ls: cannot access /lib64/liblzma.so.5: No such file or directory ls: cannot access /lib64/libpthread.so.0: No such file or directory ls: cannot access /lib64/libcap.so.2: No such file or directory ls: cannot access /lib64/libacl.so.1: No such file or directory ls: cannot access /lib64/libattr.so.1: No such file or directory ls: cannot access /lib64/librt.so.1: No such file or directory /etc/passwd {code} It looks like this problem requires a more general solution for MESOS-6011 where we e.g., pick up executable paths from {{PATH}} and their dynamic dependencies from the ELF headers. > LinuxRootfs::create ignores failures from adding non-existing files > --- > > Key: MESOS-6182 > URL: https://issues.apache.org/jira/browse/MESOS-6182 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier > > {{LinuxRootfs::create}} attempts to add a hardcoded list of files to the > created rootfs. However, if a file does not exist no failure is created, but > the file will be missing from the rootfs. > This can then lead to failures in tests using the rootfs and relying on files > in it. > We should make failures to compose the planned rootfs explicit so users of > this test code know what they can rely on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6011) Tests relying on LinuxRootfs::create fail with Fedora 24 (and probably others)
[ https://issues.apache.org/jira/browse/MESOS-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496280#comment-15496280 ] Benjamin Bannier commented on MESOS-6011: - For dynamic dependencies it tooks like {{stout/elf.hpp}} added the necessary tooling for us to discover dynamic library dependencies from ELF headers. > Tests relying on LinuxRootfs::create fail with Fedora 24 (and probably others) > -- > > Key: MESOS-6011 > URL: https://issues.apache.org/jira/browse/MESOS-6011 > Project: Mesos > Issue Type: Bug > Components: containerization, tests >Reporter: Jan Schlicht >Assignee: Gilbert Song > Labels: test > > Tests like {{AppcProvisionerIntegrationTest.ROOT_SimpleLinuxImageTest}} and > {{ProvisionerDockerPullerTest.ROOT_LocalPullerSimpleCommand}} will fail with > Fedora 24 because {{LinuxRootFs::create}}, used in these tests, assumes that > the binaries provided by the rootfs link to certain versions of shared > libraries. Because Fedora 24 has newer versions of some of these libraries, > tests using the binaries will fail. E.g. > {noformat} > $ ldd /bin/sh > linux-vdso.so.1 (0x7ffc98bfb000) > libtinfo.so.6 => /lib64/libtinfo.so.6 (0x7fcd59df6000) > libdl.so.2 => /lib64/libdl.so.2 (0x7fcd59bf2000) > libc.so.6 => /lib64/libc.so.6 (0x7fcd5982f000) > /lib64/ld-linux-x86-64.so.2 (0x55fb8e6ea000) > {noformat} > but {{LinuxRootFs::create}} will try to copy {{/lib64/libtinfo.so.5}} into > the rootfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6182) LinuxRootfs::create ignores failures from adding non-existing files
Benjamin Bannier created MESOS-6182: --- Summary: LinuxRootfs::create ignores failures from adding non-existing files Key: MESOS-6182 URL: https://issues.apache.org/jira/browse/MESOS-6182 Project: Mesos Issue Type: Bug Components: test Reporter: Benjamin Bannier Assignee: Benjamin Bannier {{LinuxRootfs::create}} attempts to add a hardcoded list of files to the created rootfs. However, if a file does not exist no failure is created, but the file will be missing from the rootfs. This can then lead to failures in tests using the rootfs and relying on files in it. We should make failures to compose the planned rootfs explicit so users of this test code know what they can rely on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5951) Remove "strict registry" code
[ https://issues.apache.org/jira/browse/MESOS-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-5951: --- Shepherd: Vinod Kone > Remove "strict registry" code > - > > Key: MESOS-5951 > URL: https://issues.apache.org/jira/browse/MESOS-5951 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere > > Once {{PARTITION_AWARE}} frameworks are supported, we should eventually > remove the code that supports the "non-strict" semantics in the master. That > is: > 1. The master will be "strict" in Mesos 1.1, in the sense that master > behavior will always reflect the content of the registry and will not change > depending on whether the master has failed over. The exception here is that > for non-PARTITION_AWARE frameworks, we will _only_ kill such tasks on a > reregistering agent if the master hasn't failed over in the meantime. i.e., > we'll remain backwards compatible with the previous "non-strict" semantics > that old frameworks might depend on. > 2. The "strict" semantics will be less problematic, because the master will > no longer be killing tasks and shutting down agents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6182) LinuxRootfs::create ignores failures from adding non-existing files
[ https://issues.apache.org/jira/browse/MESOS-6182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496264#comment-15496264 ] Benjamin Bannier commented on MESOS-6182: - Linking MESOS-6011 as an example of tests failing because {{LinuxRootfs::create}} does not create a deterministic rootfs. > LinuxRootfs::create ignores failures from adding non-existing files > --- > > Key: MESOS-6182 > URL: https://issues.apache.org/jira/browse/MESOS-6182 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier > > {{LinuxRootfs::create}} attempts to add a hardcoded list of files to the > created rootfs. However, if a file does not exist no failure is created, but > the file will be missing from the rootfs. > This can then lead to failures in tests using the rootfs and relying on files > in it. > We should make failures to compose the planned rootfs explicit so users of > this test code know what they can rely on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6180) Several tests are flaky, with futures timing out early
[ https://issues.apache.org/jira/browse/MESOS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495932#comment-15495932 ] haosdent commented on MESOS-6180: - [~greggomann] I use {{grep 'W:'}} and {{grep -v 'W:'}} to filter the stdout/stderr of MESOS-6164, MESOS-6165, and MESOS-6166. Looks like their log are not overlapping. Do you have some overlap examples that not meet this? > Several tests are flaky, with futures timing out early > -- > > Key: MESOS-6180 > URL: https://issues.apache.org/jira/browse/MESOS-6180 > Project: Mesos > Issue Type: Bug > Components: tests >Reporter: Greg Mann >Assignee: haosdent > Labels: mesosphere, tests > Attachments: CGROUPS_ROOT_PidNamespaceBackward.log, > CGROUPS_ROOT_PidNamespaceForward.log, FetchAndStoreAndStoreAndFetch.log > > > Following the merging of a large patch chain, it was noticed on our internal > CI that several tests had become flaky, with a similar pattern in the > failures: the tests fail early when a future times out. Often, this occurs > when a test cluster is being spun up and one of the offer futures times out. > This has been observed in the following tests: > * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward > * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward > * ZooKeeperStateTest.FetchAndStoreAndStoreAndFetch > * RoleTest.ImplicitRoleRegister > * SlaveRecoveryTest/0.MultipleFrameworks > * SlaveRecoveryTest/0.ReconcileShutdownFramework > * SlaveTest.ContainerizerUsageFailure > * MesosSchedulerDriverTest.ExplicitAcknowledgements > * SlaveRecoveryTest/0.ReconnectHTTPExecutor (MESOS-6164) > * ResourceOffersTest.ResourcesGetReofferedAfterTaskInfoError (MESOS-6165) > * SlaveTest.CommandTaskWithKillPolicy (MESOS-6166) > See the linked JIRAs noted above for individual tickets addressing a couple > of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6149) Checkpoint used subsystems for containers
[ https://issues.apache.org/jira/browse/MESOS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495794#comment-15495794 ] haosdent commented on MESOS-6149: - Currently, Agent would clean up those orphan isolations when it re-enable again. For example, 1. start Agent with {{--isolation=A,B}} and launch contaienr {{x}} 2. restart Agent with {{--isolation=A}} and destroy container {{x}}. Then {{x}} remain somethings need to be clean in {{B}} 3, restart Agent with {{--isolation=B}}, Agent would clean up those leak things for {{x}} in the recovery stage of {{B}}. > Checkpoint used subsystems for containers > - > > Key: MESOS-6149 > URL: https://issues.apache.org/jira/browse/MESOS-6149 > Project: Mesos > Issue Type: Improvement >Reporter: haosdent >Assignee: haosdent > > In MESOS-6063, we have tracked recovered and prepared subsystems for > containers. To make it works better, we could checkpoint this information and > recover it after Agent restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6181) The logic for BadACLNoPrincipal and BadACLDropCreateAndDestroy is not correct
Guangya Liu created MESOS-6181: -- Summary: The logic for BadACLNoPrincipal and BadACLDropCreateAndDestroy is not correct Key: MESOS-6181 URL: https://issues.apache.org/jira/browse/MESOS-6181 Project: Mesos Issue Type: Bug Reporter: Guangya Liu Two issues for those two test cases: 1) No need to add `{}` in the test case as there is no need to add `{}`, adding the `{}` will cause the driver decline a non exist offer. 2) If destroy volume failed, we should get the last offer to make sure that the last offer also contain the volume resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5384) Improve error message for missing resources file
[ https://issues.apache.org/jira/browse/MESOS-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495703#comment-15495703 ] Kris Paprocki commented on MESOS-5384: --- Looking for a shepherd for this issue. > Improve error message for missing resources file > > > Key: MESOS-5384 > URL: https://issues.apache.org/jira/browse/MESOS-5384 > Project: Mesos > Issue Type: Bug > Components: general >Affects Versions: 0.28.1 > Environment: Centos 7 >Reporter: John Yost >Assignee: Kris Paprocki >Priority: Minor > Labels: easyfix, newbie > > Attempting to specify resources file via > --resources=/etc/mesos-slave/small-slave-config.json threw the following > error: > Failed to determine slave resources: Bad value for resources, missing or > extra ':' in /etc/mesos-slave/small-slave-config.json > I confirmed I had valid JSON: > [ > { > "name": "cpus", > "type": "SCALAR", > "scalar": { > "value": 0.5 > } > }, > { > "name": "mem", > "type": "SCALAR", > "scalar": { > "value": 512 > } > } > ] > In actuality, I misread to docs with my file pattern. Once I changed to > resources=file:///etc/mesos-slave/small-slave-config.json the mesos slave > started up fine. Just need a missing file check and corresponding error > message to fix this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5893) mesos-executor should adopt and reap orphan child processes
[ https://issues.apache.org/jira/browse/MESOS-5893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495690#comment-15495690 ] Stéphane Cottin commented on MESOS-5893: Tini is a (sub)reaper commonly used in docker containers, its source may help to implement a process reaper in the executor. https://github.com/krallin/tini/blob/master/src/tini.c > mesos-executor should adopt and reap orphan child processes > --- > > Key: MESOS-5893 > URL: https://issues.apache.org/jira/browse/MESOS-5893 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.1.0 > Environment: mesos compiled from git master ( 1.1.0 ) > {{../configure --enable-ssl --enable-libevent --prefix=/usr --enable-optimize > --enable-silent-rules --enable-xfs-disk-isolator}} > isolators : > {{namespaces/pid,cgroups/cpu,cgroups/mem,filesystem/linux,docker/runtime,network/cni,docker/volume}} >Reporter: Stéphane Cottin > Labels: containerizer > > mesos containerizer does not properly handle children death. > discovered using marathon-lb, each topology update fork another haproxy, the > old haproxy process should properly die after its last client connection is > terminated, but turn into a zombie. > {noformat} > 7716 ?Ssl0:00 | \_ mesos-executor > --launcher_dir=/usr/libexec/mesos --sandbox_directory=/mnt/mesos/sandbox > --user=root --working_directory=/marathon-lb > --rootfs=/mnt/mesos/provisioner/containers/3b381d5c-7490-4dcd-ab4b-81051226075a/backends/overlay/rootfses/a4beacac-2d7e-445b-80c8-a9b4e480c491 > 7813 ?Ss 0:00 | | \_ sh -c /marathon-lb/run sse > --marathon https://marathon:8443 --auth-credentials user:pass --group > 'external' --ssl-certs /certs --max-serv-port-ip-per-task 20050 > 7823 ?S 0:00 | | | \_ /bin/bash /marathon-lb/run sse > --marathon https://marathon:8443 --auth-credentials user:pass --group > external --ssl-certs /certs --max-serv-port-ip-per-task 20050 > 7827 ?S 0:00 | | | \_ /usr/bin/runsv > /marathon-lb/service/haproxy > 7829 ?S 0:00 | | | | \_ /bin/bash ./run > 8879 ?S 0:00 | | | | \_ sleep 0.5 > 7828 ?Sl 0:00 | | | \_ python3 > /marathon-lb/marathon_lb.py --syslog-socket /dev/null --haproxy-config > /marathon-lb/haproxy.cfg --ssl-certs /certs --command sv reload > /marathon-lb/service/haproxy --sse --marathon https://marathon:8443 > --auth-credentials user:pass --group external --max-serv-port-ip-per-task > 20050 > 7906 ?Zs 0:00 | | \_ [haproxy] > 8628 ?Zs 0:00 | | \_ [haproxy] > 8722 ?Ss 0:00 | | \_ haproxy -p /tmp/haproxy.pid -f > /marathon-lb/haproxy.cfg -D -sf 144 52 > {noformat} > update: mesos-executor should be registered as a subreaper ( > http://man7.org/linux/man-pages/man2/prctl.2.html ) and propagate signals. > code sample: https://github.com/krallin/tini/blob/master/src/tini.c -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6145) Isolator namespaces/pid is leaking mounts
[ https://issues.apache.org/jira/browse/MESOS-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495610#comment-15495610 ] haosdent commented on MESOS-6145: - The recover method also incorrect, because we bind mount {{/var/empty/mesos}} to {{/var/run/mesos/pidns}}, so {code} Try> entries = os::ls(PID_NS_BIND_MOUNT_ROOT); if (entries.isError()) { return Failure("Failed to list existing containers in '" + string(PID_NS_BIND_MOUNT_ROOT) + "': " + entries.error()); } {code} could not see the mount points under {{/var/run/mesos/pidns}} > Isolator namespaces/pid is leaking mounts > - > > Key: MESOS-6145 > URL: https://issues.apache.org/jira/browse/MESOS-6145 > Project: Mesos > Issue Type: Bug > Components: containerization, isolation, security >Reporter: Stephan Erb >Assignee: Jie Yu > > As the operator of a Mesos cluster, I would like every container/executor to > run in a single PID namespace, so that a task cannot see what else is running > on the same host. > The existing {{namespaces/pid}} isolator seems to provide this feature. > However, it seems like it is leaking files. I have exactly one task running > currently, but there are still left overs from earlier invocations > {code} > vagrant@aurora:~/aurora$ ls -l /var/run/mesos/pidns/ > total 0 > -rw-r--r-- 1 root root 0 Aug 26 20:30 32b6e4c7-3d22-47ed-a350-9eb929daa241 > -rw-r--r-- 1 root root 0 Aug 26 20:30 7b812f00-4614-4016-a76c-ff78a175a1b0 > -rw-r--r-- 1 root root 0 Aug 26 20:24 d501829e-7cf8-40fb-a895-0ad3416da7dc > -rw-r--r-- 1 root root 0 Aug 26 20:24 d56ca91f-eb72-426c-8bbb-f3239358a4ef > -r--r--r-- 1 root root 0 Aug 26 20:35 fef9a109-de52-45f3-ae41-171de6495705 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)