[mesos] branch master updated: Removed unused `lib/cli/tasks.py` for new CLI.
This is an automated email from the ASF dual-hosted git repository. klueska pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/mesos.git The following commit(s) were added to refs/heads/master by this push: new fee4803 Removed unused `lib/cli/tasks.py` for new CLI. fee4803 is described below commit fee4803635776b8401a6deea3fd9b3d4133d30bc Author: Armand Grillet AuthorDate: Mon Oct 8 09:29:16 2018 -0400 Removed unused `lib/cli/tasks.py` for new CLI. This file was introduced accidentally as part of 051a138d08ba3b9e28fd6ec4e4f707cbd4bb1563 Review: https://reviews.apache.org/r/68949/ --- src/python/cli_new/lib/cli/tasks.py | 33 - 1 file changed, 33 deletions(-) diff --git a/src/python/cli_new/lib/cli/tasks.py b/src/python/cli_new/lib/cli/tasks.py deleted file mode 100644 index 531e001..000 --- a/src/python/cli_new/lib/cli/tasks.py +++ /dev/null @@ -1,33 +0,0 @@ -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -""" -Functions to handle tasks. -""" - -from cli import http -from cli.exceptions import CLIException - -def get_tasks(master): -""" -Get the tasks in a Mesos cluster. -""" -try: -return http.get_json(master, "tasks")["tasks"] -except Exception as exception: -raise CLIException("Could not open '/tasks'" - " endpoint at '{addr}': {error}" - .format(addr=master, error=exception))
[mesos] branch master updated: Moved `get_agent_address` from `util.py` to `mesos.py` in new CLI.
This is an automated email from the ASF dual-hosted git repository. klueska pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/mesos.git The following commit(s) were added to refs/heads/master by this push: new aa98a49 Moved `get_agent_address` from `util.py` to `mesos.py` in new CLI. aa98a49 is described below commit aa98a4918eab5c46383d352aee16e0b8ed4e2b13 Author: Armand Grillet AuthorDate: Mon Oct 8 09:33:07 2018 -0400 Moved `get_agent_address` from `util.py` to `mesos.py` in new CLI. Review: https://reviews.apache.org/r/68950/ --- src/python/cli_new/lib/cli/mesos.py | 19 +++ src/python/cli_new/lib/cli/util.py | 20 2 files changed, 19 insertions(+), 20 deletions(-) diff --git a/src/python/cli_new/lib/cli/mesos.py b/src/python/cli_new/lib/cli/mesos.py index 068d694..7cf84bc 100644 --- a/src/python/cli_new/lib/cli/mesos.py +++ b/src/python/cli_new/lib/cli/mesos.py @@ -22,6 +22,24 @@ from cli import http from cli.exceptions import CLIException +def get_agent_address(agent_id, master): +""" +Given a master and an agent id, return the agent address +by checking the /slaves endpoint of the master. +""" +try: +agents = http.get_json(master, "slaves")["slaves"] +except Exception as exception: +raise CLIException("Could not open '/slaves'" + " endpoint at '{addr}': {error}" + .format(addr=master, + error=exception)) +for agent in agents: +if agent["id"] == agent_id: +return agent["pid"].split("@")[1] +raise CLIException("Unable to find agent '{id}'".format(id=agent_id)) + + def get_agents(master): """ Get the agents in a Mesos cluster. @@ -44,6 +62,7 @@ def get_agents(master): return data[key] + def get_tasks(master): """ Get the tasks in a Mesos cluster. diff --git a/src/python/cli_new/lib/cli/util.py b/src/python/cli_new/lib/cli/util.py index 7cec7e4..e79268d 100644 --- a/src/python/cli_new/lib/cli/util.py +++ b/src/python/cli_new/lib/cli/util.py @@ -28,8 +28,6 @@ import textwrap from kazoo.client import KazooClient -from cli import http - from cli.exceptions import CLIException @@ -186,24 +184,6 @@ def verify_address_format(address): raise CLIException("The port '{port}' is not valid") -def get_agent_address(agent_id, master): -""" -Given a master and an agent id, return the agent address -by checking the /slaves endpoint of the master. -""" -try: -agents = http.get_json(master, "slaves")["slaves"] -except Exception as exception: -raise CLIException("Could not open '/slaves'" - " endpoint at '{addr}': {error}" - .format(addr=master, - error=exception)) -for agent in agents: -if agent["id"] == agent_id: -return agent["pid"].split("@")[1] -raise CLIException("Unable to find agent '{id}'".format(id=agent_id)) - - def join_plugin_paths(settings, config): """ Return all the plugin paths combined
[mesos] 02/02: Fused constructors of `MethodNotAllowed` into one.
This is an automated email from the ASF dual-hosted git repository. alexr pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/mesos.git commit 17c1d7d41a5489cc865f787deeeb2b91865b8fe1 Author: Alexander Rukletsov AuthorDate: Sun Oct 7 16:31:56 2018 +0200 Fused constructors of `MethodNotAllowed` into one. There is no good reason to provide two c-tors for `MethodNotAllowed`, with one taking `requestMethod` and one not. Instead, an `Option<>` can be used. This also removes the need for copy-paste in the c-tor body. Review: https://reviews.apache.org/r/68945 --- 3rdparty/libprocess/include/process/http.hpp | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/3rdparty/libprocess/include/process/http.hpp b/3rdparty/libprocess/include/process/http.hpp index 00dc2fd..bbcd0ba 100644 --- a/3rdparty/libprocess/include/process/http.hpp +++ b/3rdparty/libprocess/include/process/http.hpp @@ -753,16 +753,9 @@ struct MethodNotAllowed : Response // According to RFC 2616, "An Allow header field MUST be present in a // 405 (Method Not Allowed) response". - explicit MethodNotAllowed( - const std::initializer_list& allowedMethods) -: Response("405 Method Not Allowed.", Status::METHOD_NOT_ALLOWED) - { -headers["Allow"] = strings::join(", ", allowedMethods); - } - MethodNotAllowed( const std::initializer_list& allowedMethods, - const std::string& requestMethod) + const Option& requestMethod = None()) : Response( constructBody(allowedMethods, requestMethod), Status::METHOD_NOT_ALLOWED) @@ -773,11 +766,15 @@ struct MethodNotAllowed : Response private: static std::string constructBody( const std::initializer_list& allowedMethods, - const std::string& requestMethod) + const Option& requestMethod) { -return "405 Method Not Allowed. Expecting one of { '" + - strings::join("', '", allowedMethods) + "' }, but received '" + - requestMethod + "'."; +return +"405 Method Not Allowed. Expecting one of { '" + +strings::join("', '", allowedMethods) + "' }" + +(requestMethod.isSome() + ? ", but received '" + requestMethod.get() + "'" + : "") + +"."; } };
[mesos] 01/02: Used delegating constructors in `Response` types.
This is an automated email from the ASF dual-hosted git repository. alexr pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/mesos.git commit 4bf72d9851e924589ae769c9706eae0e82f2a254 Author: Alexander Rukletsov AuthorDate: Sun Oct 7 15:55:13 2018 +0200 Used delegating constructors in `Response` types. For clarity and brevity, use delegating constructors (available since C++11) in descendants of the `Response` class. Review: https://reviews.apache.org/r/68944 --- 3rdparty/libprocess/include/process/http.hpp | 29 +++- 1 file changed, 11 insertions(+), 18 deletions(-) diff --git a/3rdparty/libprocess/include/process/http.hpp b/3rdparty/libprocess/include/process/http.hpp index cef511a..00dc2fd 100644 --- a/3rdparty/libprocess/include/process/http.hpp +++ b/3rdparty/libprocess/include/process/http.hpp @@ -702,7 +702,7 @@ struct TemporaryRedirect : Response struct BadRequest : Response { BadRequest() -: Response("400 Bad Request.", Status::BAD_REQUEST) {} +: BadRequest("400 Bad Request.") {} explicit BadRequest(const std::string& body) : Response(body, Status::BAD_REQUEST) {} @@ -712,14 +712,7 @@ struct BadRequest : Response struct Unauthorized : Response { explicit Unauthorized(const std::vector& challenges) -: Response("401 Unauthorized.", Status::UNAUTHORIZED) - { -// TODO(arojas): Many HTTP client implementations do not support -// multiple challenges within a single 'WWW-Authenticate' header. -// Once MESOS-3306 is fixed, we can use multiple entries for the -// same header. -headers["WWW-Authenticate"] = strings::join(", ", challenges); - } +: Unauthorized(challenges, "401 Unauthorized.") {} Unauthorized( const std::vector& challenges, @@ -738,7 +731,7 @@ struct Unauthorized : Response struct Forbidden : Response { Forbidden() -: Response("403 Forbidden.", Status::FORBIDDEN) {} +: Forbidden("403 Forbidden.") {} explicit Forbidden(const std::string& body) : Response(body, Status::FORBIDDEN) {} @@ -748,7 +741,7 @@ struct Forbidden : Response struct NotFound : Response { NotFound() -: Response("404 Not Found.", Status::NOT_FOUND) {} +: NotFound("404 Not Found.") {} explicit NotFound(const std::string& body) : Response(body, Status::NOT_FOUND) {} @@ -792,7 +785,7 @@ private: struct NotAcceptable : Response { NotAcceptable() -: Response("406 Not Acceptable.", Status::NOT_ACCEPTABLE) {} +: NotAcceptable("406 Not Acceptable.") {} explicit NotAcceptable(const std::string& body) : Response(body, Status::NOT_ACCEPTABLE) {} @@ -802,7 +795,7 @@ struct NotAcceptable : Response struct Conflict : Response { Conflict() -: Response("409 Conflict.", Status::CONFLICT) {} +: Conflict("409 Conflict.") {} explicit Conflict(const std::string& body) : Response(body, Status::CONFLICT) {} @@ -812,7 +805,7 @@ struct Conflict : Response struct PreconditionFailed : Response { PreconditionFailed() -: Response("412 Precondition Failed.", Status::PRECONDITION_FAILED) {} +: PreconditionFailed("412 Precondition Failed.") {} explicit PreconditionFailed(const std::string& body) : Response(body, Status::PRECONDITION_FAILED) {} @@ -822,7 +815,7 @@ struct PreconditionFailed : Response struct UnsupportedMediaType : Response { UnsupportedMediaType() -: Response("415 Unsupported Media Type.", Status::UNSUPPORTED_MEDIA_TYPE) {} +: UnsupportedMediaType("415 Unsupported Media Type.") {} explicit UnsupportedMediaType(const std::string& body) : Response(body, Status::UNSUPPORTED_MEDIA_TYPE) {} @@ -832,7 +825,7 @@ struct UnsupportedMediaType : Response struct InternalServerError : Response { InternalServerError() -: Response("500 Internal Server Error.", Status::INTERNAL_SERVER_ERROR) {} +: InternalServerError("500 Internal Server Error.") {} explicit InternalServerError(const std::string& body) : Response(body, Status::INTERNAL_SERVER_ERROR) {} @@ -842,7 +835,7 @@ struct InternalServerError : Response struct NotImplemented : Response { NotImplemented() -: Response("501 Not Implemented.", Status::NOT_IMPLEMENTED) {} +: NotImplemented("501 Not Implemented.") {} explicit NotImplemented(const std::string& body) : Response(body, Status::NOT_IMPLEMENTED) {} @@ -852,7 +845,7 @@ struct NotImplemented : Response struct ServiceUnavailable : Response { ServiceUnavailable() -: Response("503 Service Unavailable.", Status::SERVICE_UNAVAILABLE) {} +: ServiceUnavailable("503 Service Unavailable.") {} explicit ServiceUnavailable(const std::string& body) : Response(body, Status::SERVICE_UNAVAILABLE) {}
[mesos] branch master updated (aa98a49 -> 17c1d7d)
This is an automated email from the ASF dual-hosted git repository. alexr pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/mesos.git. from aa98a49 Moved `get_agent_address` from `util.py` to `mesos.py` in new CLI. new 4bf72d9 Used delegating constructors in `Response` types. new 17c1d7d Fused constructors of `MethodNotAllowed` into one. The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: 3rdparty/libprocess/include/process/http.hpp | 50 +++- 1 file changed, 20 insertions(+), 30 deletions(-)
[mesos] 02/03: Added an unit test for agent recovery with new cgroup subsystems.
This is an automated email from the ASF dual-hosted git repository. gilbert pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/mesos.git commit 36a64c869cb04704047b86d3f8d11f1399aa8a8c Author: Gilbert Song AuthorDate: Fri Oct 5 12:19:01 2018 -0700 Added an unit test for agent recovery with new cgroup subsystems. Review: https://reviews.apache.org/r/68941 --- src/tests/containerizer/cgroups_isolator_tests.cpp | 147 + 1 file changed, 147 insertions(+) diff --git a/src/tests/containerizer/cgroups_isolator_tests.cpp b/src/tests/containerizer/cgroups_isolator_tests.cpp index 368ab93..fccab20 100644 --- a/src/tests/containerizer/cgroups_isolator_tests.cpp +++ b/src/tests/containerizer/cgroups_isolator_tests.cpp @@ -1904,6 +1904,153 @@ TEST_F(CgroupsIsolatorTest, ROOT_CGROUPS_AutoLoadSubsystems) } +// This test verifies that after the agent recovery/upgrade, nested +// containers could still be launched under old containers which +// were launched before agent restarts if there are new cgroup +// subsystems are added in the agent cgroup isolation. +TEST_F(CgroupsIsolatorTest, ROOT_CGROUPS_AgentRecoveryWithNewCgroupSubsystems) +{ + // Disable AuthN on the agent. + slave::Flags flags = CreateSlaveFlags(); + flags.isolation = "filesystem/linux,docker/runtime,cgroups/mem"; + flags.image_providers = "docker"; + flags.authenticate_http_readwrite = false; + + Try> master = StartMaster(); + ASSERT_SOME(master); + + Owned detector = master.get()->createDetector(); + + // Start the slave with a static process ID. This allows the executor to + // reconnect with the slave upon a process restart. + const string id("agent"); + + Try> slave = StartSlave(detector.get(), id, flags); + ASSERT_SOME(slave); + + auto scheduler = std::make_shared(); + + v1::FrameworkInfo frameworkInfo = v1::DEFAULT_FRAMEWORK_INFO; + frameworkInfo.set_checkpoint(true); + + EXPECT_CALL(*scheduler, connected(_)) +.WillOnce(v1::scheduler::SendSubscribe(frameworkInfo)); + + Future subscribed; + EXPECT_CALL(*scheduler, subscribed(_, _)) +.WillOnce(FutureArg<1>(&subscribed)); + + Future offers1; + EXPECT_CALL(*scheduler, offers(_, _)) +.WillOnce(FutureArg<1>(&offers1)) +.WillRepeatedly(Return()); + + EXPECT_CALL(*scheduler, heartbeat(_)) +.WillRepeatedly(Return()); // Ignore heartbeats. + + v1::scheduler::TestMesos mesos( + master.get()->pid, ContentType::PROTOBUF, scheduler); + + AWAIT_READY(subscribed); + v1::FrameworkID frameworkId(subscribed->framework_id()); + + v1::ExecutorInfo executorInfo = v1::createExecutorInfo( + "test_default_executor", + None(), + "cpus:0.1;mem:32;disk:32", + v1::ExecutorInfo::DEFAULT); + + // Update `executorInfo` with the subscribed `frameworkId`. + executorInfo.mutable_framework_id()->CopyFrom(frameworkId); + + AWAIT_READY(offers1); + ASSERT_FALSE(offers1->offers().empty()); + + const v1::Offer& offer1 = offers1->offers(0); + + v1::TaskInfo taskInfo1 = v1::createTask( + offer1.agent_id(), + v1::Resources::parse("cpus:0.1;mem:32;disk:32").get(), + "sleep 1000"); + + Future startingUpdate1; + Future runningUpdate1; + EXPECT_CALL(*scheduler, update(_, _)) +.WillOnce(DoAll( +FutureArg<1>(&startingUpdate1), +v1::scheduler::SendAcknowledge(frameworkId, offer1.agent_id( +.WillOnce(DoAll( +FutureArg<1>(&runningUpdate1), +v1::scheduler::SendAcknowledge(frameworkId, offer1.agent_id( +.WillRepeatedly(Return()); + + mesos.send( + v1::createCallAccept( + frameworkId, + offer1, + {v1::LAUNCH_GROUP( + executorInfo, v1::createTaskGroupInfo({taskInfo1}))})); + + AWAIT_READY(startingUpdate1); + ASSERT_EQ(v1::TASK_STARTING, startingUpdate1->status().state()); + ASSERT_EQ(taskInfo1.task_id(), startingUpdate1->status().task_id()); + + AWAIT_READY(runningUpdate1); + ASSERT_EQ(v1::TASK_RUNNING, runningUpdate1->status().state()); + ASSERT_EQ(taskInfo1.task_id(), runningUpdate1->status().task_id()); + + slave.get()->terminate(); + slave->reset(); + + Future __recover = FUTURE_DISPATCH(_, &Slave::__recover); + + // Update the cgroup isolation to introduce new subsystems. + flags.isolation = "filesystem/linux,docker/runtime,cgroups/all"; + slave = this->StartSlave(detector.get(), id, flags); + ASSERT_SOME(slave); + + AWAIT_READY(__recover); + + Future offers2; + EXPECT_CALL(*scheduler, offers(_, _)) +.WillOnce(FutureArg<1>(&offers2)) +.WillRepeatedly(Return()); + + AWAIT_READY(offers2); + ASSERT_FALSE(offers2->offers().empty()); + + const v1::Offer& offer2 = offers2->offers(0); + + v1::TaskInfo taskInfo2 = v1::createTask( + offer2.agent_id(), + v1::Resources::parse("cpus:0.1;mem:32;disk:32").get(), + "sleep 1000"); + + Future startingUpdate2; + Future runningUpdate2; + EXPECT_CALL(*scheduler, update(_, _)) +.WillOnce(DoAll( +Fut
[mesos] 01/03: Fixed the nested container launch failure on the agent upgrade case.
This is an automated email from the ASF dual-hosted git repository. gilbert pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/mesos.git commit 200a532d33b647cc26d9566bbc1765bc039e699d Author: Gilbert Song AuthorDate: Thu Oct 4 16:54:24 2018 -0700 Fixed the nested container launch failure on the agent upgrade case. If there are new cgroup subsystems are added after the agent upgrad or recovery, new nested container launched under old containers that are launched before the recovery would fail, because it cannot assign its pid to the non-existed cgroup hierarchy. We should skip those new cgroup subsystems for nested containers under old containers. Review: https://reviews.apache.org/r/68929 --- .../mesos/isolators/cgroups/cgroups.cpp| 34 +- 1 file changed, 26 insertions(+), 8 deletions(-) diff --git a/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp b/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp index 11dfbab..fbb1b43 100644 --- a/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp +++ b/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp @@ -340,10 +340,13 @@ Future CgroupsIsolatorProcess::___recover( // TODO(haosdent): Use foreachkey once MESOS-5037 is resolved. foreach (const string& hierarchy, subsystems.keys()) { if (!cgroups::exists(hierarchy, cgroup)) { - // This may occur if the executor has exited and the isolator - // has destroyed the cgroup but the agent dies before noticing - // this. This will be detected when the containerizer tries to - // monitor the executor's pid. + // This may occur in two cases: + // 1. If the executor has exited and the isolator has destroyed + //the cgroup but the agent dies before noticing this. This + //will be detected when the containerizer tries to monitor + //the executor's pid. + // 2. After the agent recovery/upgrade, new cgroup subsystems + //are added to the agent cgroup isolation configuration. LOG(WARNING) << "Couldn't find the cgroup '" << cgroup << "' " << "in hierarchy '" << hierarchy << "' " << "for container " << containerId; @@ -677,18 +680,33 @@ Future CgroupsIsolatorProcess::isolate( return Failure("Failed to isolate the container: Unknown root container"); } + const string& cgroup = infos[rootContainerId]->cgroup; + // TODO(haosdent): Use foreachkey once MESOS-5037 is resolved. foreach (const string& hierarchy, subsystems.keys()) { +// If new cgroup subsystems are added after the agent +// upgrade, the newly added cgroup subsystems do not +// exist on old container's cgroup hierarchy. So skip +// assigning the pid to this cgroup subsystem. +if (containerId.has_parent() && !cgroups::exists(hierarchy, cgroup)) { + LOG(INFO) << "Skipping assigning pid " << stringify(pid) +<< " to cgroup at '" << path::join(hierarchy, cgroup) +<< "' for container " << containerId +<< " because its parent container " << containerId.parent() +<< " does not have this cgroup hierarchy"; + continue; +} + Try assign = cgroups::assign( hierarchy, -infos[rootContainerId]->cgroup, +cgroup, pid); if (assign.isError()) { string message = -"Failed to assign pid " + stringify(pid) + " to cgroup at " -"'" + path::join(hierarchy, infos[rootContainerId]->cgroup) + "'" -": " + assign.error(); +"Failed to assign container " + stringify(containerId) + +" pid " + stringify(pid) + " to cgroup at '" + +path::join(hierarchy, cgroup) + "': " + assign.error(); LOG(ERROR) << message;
[mesos] 03/03: Added MESOS-9295 to 1.7.1 CHANGELOG.
This is an automated email from the ASF dual-hosted git repository. gilbert pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/mesos.git commit e135b7f2175c01fd67b71b22cdc325ac37853a9d Author: Gilbert Song AuthorDate: Mon Oct 8 10:34:27 2018 -0700 Added MESOS-9295 to 1.7.1 CHANGELOG. --- CHANGELOG | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGELOG b/CHANGELOG index 6a47201..8756474 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -15,6 +15,7 @@ Release Notes - Mesos - Version 1.7.1 (WIP) * [MESOS-9274] - v1 JAVA scheduler library can drop TEARDOWN upon destruction. * [MESOS-9279] - Docker Containerizer 'usage' call might be expensive if mount table is big. * [MESOS-9283] - Docker containerizer actor can get backlogged with large number of containers. + * [MESOS-9295] - Nested container launch could fail if the agent upgrade with new cgroup subsystems. ** Improvement: * [MESOS-6765] - Make the Resources wrapper "copy-on-write" to improve performance.
[mesos] branch master updated (17c1d7d -> e135b7f)
This is an automated email from the ASF dual-hosted git repository. gilbert pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/mesos.git. from 17c1d7d Fused constructors of `MethodNotAllowed` into one. new 200a532 Fixed the nested container launch failure on the agent upgrade case. new 36a64c8 Added an unit test for agent recovery with new cgroup subsystems. new e135b7f Added MESOS-9295 to 1.7.1 CHANGELOG. The 3 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: CHANGELOG | 1 + .../mesos/isolators/cgroups/cgroups.cpp| 34 +++-- src/tests/containerizer/cgroups_isolator_tests.cpp | 147 + 3 files changed, 174 insertions(+), 8 deletions(-)
[mesos] 02/02: Added MESOS-9295 to 1.7.1 CHANGELOG.
This is an automated email from the ASF dual-hosted git repository. gilbert pushed a commit to branch 1.7.x in repository https://gitbox.apache.org/repos/asf/mesos.git commit 3e185f14075b827b8ea3b03a48ed2cd136ce8158 Author: Gilbert Song AuthorDate: Mon Oct 8 10:34:27 2018 -0700 Added MESOS-9295 to 1.7.1 CHANGELOG. (cherry picked from commit e135b7f2175c01fd67b71b22cdc325ac37853a9d) --- CHANGELOG | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGELOG b/CHANGELOG index 268..75be171 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -15,6 +15,7 @@ Release Notes - Mesos - Version 1.7.1 (WIP) * [MESOS-9274] - v1 JAVA scheduler library can drop TEARDOWN upon destruction. * [MESOS-9279] - Docker Containerizer 'usage' call might be expensive if mount table is big. * [MESOS-9283] - Docker containerizer actor can get backlogged with large number of containers. + * [MESOS-9295] - Nested container launch could fail if the agent upgrade with new cgroup subsystems. ** Improvement: * [MESOS-6765] - Make the Resources wrapper "copy-on-write" to improve performance.
[mesos] branch 1.7.x updated (1ecc3c6 -> 3e185f1)
This is an automated email from the ASF dual-hosted git repository. gilbert pushed a change to branch 1.7.x in repository https://gitbox.apache.org/repos/asf/mesos.git. from 1ecc3c6 Added MESOS-9283 to the 1.7.x CHANGELOG. new e9a2d1a Fixed the nested container launch failure on the agent upgrade case. new 3e185f1 Added MESOS-9295 to 1.7.1 CHANGELOG. The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: CHANGELOG | 1 + .../mesos/isolators/cgroups/cgroups.cpp| 34 +- 2 files changed, 27 insertions(+), 8 deletions(-)
[mesos] 01/02: Fixed the nested container launch failure on the agent upgrade case.
This is an automated email from the ASF dual-hosted git repository. gilbert pushed a commit to branch 1.7.x in repository https://gitbox.apache.org/repos/asf/mesos.git commit e9a2d1a7dbba1e7900417461a935b284243e79a4 Author: Gilbert Song AuthorDate: Thu Oct 4 16:54:24 2018 -0700 Fixed the nested container launch failure on the agent upgrade case. If there are new cgroup subsystems are added after the agent upgrad or recovery, new nested container launched under old containers that are launched before the recovery would fail, because it cannot assign its pid to the non-existed cgroup hierarchy. We should skip those new cgroup subsystems for nested containers under old containers. Review: https://reviews.apache.org/r/68929 (cherry picked from commit 200a532d33b647cc26d9566bbc1765bc039e699d) --- .../mesos/isolators/cgroups/cgroups.cpp| 34 +- 1 file changed, 26 insertions(+), 8 deletions(-) diff --git a/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp b/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp index 11dfbab..fbb1b43 100644 --- a/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp +++ b/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp @@ -340,10 +340,13 @@ Future CgroupsIsolatorProcess::___recover( // TODO(haosdent): Use foreachkey once MESOS-5037 is resolved. foreach (const string& hierarchy, subsystems.keys()) { if (!cgroups::exists(hierarchy, cgroup)) { - // This may occur if the executor has exited and the isolator - // has destroyed the cgroup but the agent dies before noticing - // this. This will be detected when the containerizer tries to - // monitor the executor's pid. + // This may occur in two cases: + // 1. If the executor has exited and the isolator has destroyed + //the cgroup but the agent dies before noticing this. This + //will be detected when the containerizer tries to monitor + //the executor's pid. + // 2. After the agent recovery/upgrade, new cgroup subsystems + //are added to the agent cgroup isolation configuration. LOG(WARNING) << "Couldn't find the cgroup '" << cgroup << "' " << "in hierarchy '" << hierarchy << "' " << "for container " << containerId; @@ -677,18 +680,33 @@ Future CgroupsIsolatorProcess::isolate( return Failure("Failed to isolate the container: Unknown root container"); } + const string& cgroup = infos[rootContainerId]->cgroup; + // TODO(haosdent): Use foreachkey once MESOS-5037 is resolved. foreach (const string& hierarchy, subsystems.keys()) { +// If new cgroup subsystems are added after the agent +// upgrade, the newly added cgroup subsystems do not +// exist on old container's cgroup hierarchy. So skip +// assigning the pid to this cgroup subsystem. +if (containerId.has_parent() && !cgroups::exists(hierarchy, cgroup)) { + LOG(INFO) << "Skipping assigning pid " << stringify(pid) +<< " to cgroup at '" << path::join(hierarchy, cgroup) +<< "' for container " << containerId +<< " because its parent container " << containerId.parent() +<< " does not have this cgroup hierarchy"; + continue; +} + Try assign = cgroups::assign( hierarchy, -infos[rootContainerId]->cgroup, +cgroup, pid); if (assign.isError()) { string message = -"Failed to assign pid " + stringify(pid) + " to cgroup at " -"'" + path::join(hierarchy, infos[rootContainerId]->cgroup) + "'" -": " + assign.error(); +"Failed to assign container " + stringify(containerId) + +" pid " + stringify(pid) + " to cgroup at '" + +path::join(hierarchy, cgroup) + "': " + assign.error(); LOG(ERROR) << message;
[mesos] 03/04: Enabled `--fetch_stall_timeout` in curl-based URI fetcher plugins.
This is an automated email from the ASF dual-hosted git repository. chhsiao pushed a commit to branch 1.5.x in repository https://gitbox.apache.org/repos/asf/mesos.git commit 97f73a9e844f9f37d54a97c8993dbf05cffc9592 Author: Chun-Hung Hsiao AuthorDate: Wed Mar 28 22:47:58 2018 -0700 Enabled `--fetch_stall_timeout` in curl-based URI fetcher plugins. This patch passes the `--fetch_stall_timeout` agent flag into `DockerFetcherPlugin` (through setting flag `docker_stall_timeout` in the Docker store) and `CurlFetcherPlugin` (through setting flag `curl_stall_timeout` in the Appc store). Review: https://reviews.apache.org/r/65876/ --- .../containerizer/mesos/provisioner/appc/store.cpp | 5 +- .../mesos/provisioner/docker/store.cpp | 1 + src/uri/fetchers/curl.cpp | 23 +++- src/uri/fetchers/curl.hpp | 12 +++- src/uri/fetchers/docker.cpp| 64 -- src/uri/fetchers/docker.hpp| 1 + 6 files changed, 85 insertions(+), 21 deletions(-) diff --git a/src/slave/containerizer/mesos/provisioner/appc/store.cpp b/src/slave/containerizer/mesos/provisioner/appc/store.cpp index c1f9661..f30c166 100644 --- a/src/slave/containerizer/mesos/provisioner/appc/store.cpp +++ b/src/slave/containerizer/mesos/provisioner/appc/store.cpp @@ -131,7 +131,10 @@ Try> Store::create( // TODO(jojy): Uri fetcher has 'shared' semantics for the // provisioner. It's a shared pointer which needs to be injected // from top level into the store (instead of being created here). - Try> uriFetcher = uri::fetcher::create(); + uri::fetcher::Flags _flags; + _flags.curl_stall_timeout = flags.fetcher_stall_timeout; + + Try> uriFetcher = uri::fetcher::create(_flags); if (uriFetcher.isError()) { return Error("Failed to create uri fetcher: " + uriFetcher.error()); } diff --git a/src/slave/containerizer/mesos/provisioner/docker/store.cpp b/src/slave/containerizer/mesos/provisioner/docker/store.cpp index d64e6eb..d277cc6 100644 --- a/src/slave/containerizer/mesos/provisioner/docker/store.cpp +++ b/src/slave/containerizer/mesos/provisioner/docker/store.cpp @@ -141,6 +141,7 @@ Try> Store::create( // TODO(dpravat): Remove after resolving MESOS-5473. #ifndef __WINDOWS__ _flags.docker_config = flags.docker_config; + _flags.docker_stall_timeout = flags.fetcher_stall_timeout; #endif Try> fetcher = uri::fetcher::create(_flags); diff --git a/src/uri/fetchers/curl.cpp b/src/uri/fetchers/curl.cpp index f34daf2..2f67a86 100644 --- a/src/uri/fetchers/curl.cpp +++ b/src/uri/fetchers/curl.cpp @@ -54,6 +54,16 @@ using process::Subprocess; namespace mesos { namespace uri { +CurlFetcherPlugin::Flags::Flags() +{ + add(&Flags::curl_stall_timeout, + "curl_stall_timeout", + "Amount of time for the fetcher to wait before considering a download\n" + "being too slow and abort it when the download stalls (i.e., the speed\n" + "keeps below one byte per second).\n"); +} + + const char CurlFetcherPlugin::NAME[] = "curl"; @@ -61,7 +71,7 @@ Try> CurlFetcherPlugin::create(const Flags& flags) { // TODO(jieyu): Make sure curl is available. - return Owned(new CurlFetcherPlugin()); + return Owned(new CurlFetcherPlugin(flags)); } @@ -98,7 +108,7 @@ Future CurlFetcherPlugin::fetch( // TODO(jieyu): Allow user to specify the name of the output file. const string output = path::join(directory, Path(uri.path()).basename()); - const vector argv = { + vector argv = { "curl", "-s", // Don't show progress meter or error messages. "-S", // Makes curl show an error message if it fails. @@ -108,6 +118,15 @@ Future CurlFetcherPlugin::fetch( strings::trim(stringify(uri)) }; + // Add a timeout for curl to abort when the download speed keeps low + // (1 byte per second by default) for the specified duration. See: + // https://curl.haxx.se/docs/manpage.html#-y + if (flags.curl_stall_timeout.isSome()) { +argv.push_back("-y"); +argv.push_back( +std::to_string(static_cast(flags.curl_stall_timeout->secs(; + } + Try s = subprocess( "curl", argv, diff --git a/src/uri/fetchers/curl.hpp b/src/uri/fetchers/curl.hpp index 909c2eb..e07c1e2 100644 --- a/src/uri/fetchers/curl.hpp +++ b/src/uri/fetchers/curl.hpp @@ -30,7 +30,13 @@ namespace uri { class CurlFetcherPlugin : public Fetcher::Plugin { public: - class Flags : public virtual flags::FlagsBase {}; + class Flags : public virtual flags::FlagsBase + { + public: +Flags(); + +Option curl_stall_timeout; + }; static const char NAME[]; @@ -48,7 +54,9 @@ public: const Option& data = None()) const; private: - CurlFetcherPlugin() {} + explicit CurlFetcherPlugin(const Flags& _flags) : flags(_flags) {} + + const Flags flags; }; } // namespace uri { diff --git a/src/uri/fetchers/docker.cpp
[mesos] 01/04: Added the `stall_timeout` parameter to `net::download()`.
This is an automated email from the ASF dual-hosted git repository. chhsiao pushed a commit to branch 1.5.x in repository https://gitbox.apache.org/repos/asf/mesos.git commit f43d2cd58fb17cce26fdc050421d1d00b680c253 Author: Chun-Hung Hsiao AuthorDate: Wed Mar 28 22:47:46 2018 -0700 Added the `stall_timeout` parameter to `net::download()`. When the `stall_timeout` is given, the download would be aborted if the download speed keeps below 1 byte per second during the timeout. Review: https://reviews.apache.org/r/65855/ --- 3rdparty/stout/include/stout/net.hpp | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/3rdparty/stout/include/stout/net.hpp b/3rdparty/stout/include/stout/net.hpp index abb0144..d2992c0 100644 --- a/3rdparty/stout/include/stout/net.hpp +++ b/3rdparty/stout/include/stout/net.hpp @@ -48,6 +48,7 @@ #include #include +#include #include #include #include @@ -135,8 +136,12 @@ inline Try contentLength(const std::string& url) // Returns the HTTP response code resulting from attempting to // download the specified HTTP or FTP URL into a file at the specified -// path. -inline Try download(const std::string& url, const std::string& path) +// path. The `stall_timeout` parameter controls how long the download +// waits before aborting when the download speed keeps below 1 byte/sec. +inline Try download( +const std::string& url, +const std::string& path, +const Option& stall_timeout = None()) { initialize(); @@ -176,6 +181,16 @@ inline Try download(const std::string& url, const std::string& path) } curl_easy_setopt(curl, CURLOPT_WRITEDATA, file); + if (stall_timeout.isSome()) { +// Set the options to abort the download if the speed keeps below +// 1 byte/sec during the timeout. See: +// https://curl.haxx.se/libcurl/c/CURLOPT_LOW_SPEED_LIMIT.html +// https://curl.haxx.se/libcurl/c/CURLOPT_LOW_SPEED_TIME.html +curl_easy_setopt(curl, CURLOPT_LOW_SPEED_LIMIT, 1L); +curl_easy_setopt( +curl, CURLOPT_LOW_SPEED_TIME, static_cast(stall_timeout->secs())); + } + CURLcode curlErrorCode = curl_easy_perform(curl); if (curlErrorCode != 0) { curl_easy_cleanup(curl);
[mesos] 04/04: Added MESOS-8620 to the 1.5.2 CHANGELOG.
This is an automated email from the ASF dual-hosted git repository. chhsiao pushed a commit to branch 1.5.x in repository https://gitbox.apache.org/repos/asf/mesos.git commit d2ce21df0b0a9d008689eee74ff53cef4819a79c Author: Chun-Hung Hsiao AuthorDate: Fri Oct 5 15:23:10 2018 -0700 Added MESOS-8620 to the 1.5.2 CHANGELOG. --- CHANGELOG | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGELOG b/CHANGELOG index 68b6335..ca56fce 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -7,6 +7,7 @@ Release Notes - Mesos - Version 1.5.2 (WIP) * [MESOS-8418] - mesos-agent high cpu usage because of numerous /proc/mounts reads. * [MESOS-8545] - AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky. * [MESOS-8568] - Command checks should always call `WAIT_NESTED_CONTAINER` before `REMOVE_NESTED_CONTAINER` + * [MESOS-8620] - Containers stuck in FETCHING possibly due to unresponsive server. * [MESOS-8830] - Agent gc on old slave sandboxes could empty persistent volume data * [MESOS-8871] - Agent may fail to recover if the agent dies before image store cache checkpointed. * [MESOS-8904] - Master crash when removing quota.
[mesos] branch 1.5.x updated (ba960ed -> d2ce21d)
This is an automated email from the ASF dual-hosted git repository. chhsiao pushed a change to branch 1.5.x in repository https://gitbox.apache.org/repos/asf/mesos.git. from ba960ed Added a log line to `MesosContainerizer::kill()`. new f43d2cd Added the `stall_timeout` parameter to `net::download()`. new 0d05ffc Added `--fetcher_stall_timeout` to abort stalled artifact fetching. new 97f73a9 Enabled `--fetch_stall_timeout` in curl-based URI fetcher plugins. new d2ce21d Added MESOS-8620 to the 1.5.2 CHANGELOG. The 4 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: 3rdparty/stout/include/stout/net.hpp | 19 ++- CHANGELOG | 1 + docs/configuration/agent.md| 12 include/mesos/fetcher/fetcher.proto| 3 + src/launcher/fetcher.cpp | 40 +- src/slave/constants.hpp| 3 + src/slave/containerizer/fetcher.cpp| 3 + .../containerizer/mesos/provisioner/appc/store.cpp | 5 +- .../mesos/provisioner/docker/store.cpp | 1 + src/slave/flags.cpp| 9 +++ src/slave/flags.hpp| 1 + src/uri/fetchers/curl.cpp | 23 +++- src/uri/fetchers/curl.hpp | 12 +++- src/uri/fetchers/docker.cpp| 64 -- src/uri/fetchers/docker.hpp| 1 + 15 files changed, 161 insertions(+), 36 deletions(-)
[mesos] 02/04: Added `--fetcher_stall_timeout` to abort stalled artifact fetching.
This is an automated email from the ASF dual-hosted git repository. chhsiao pushed a commit to branch 1.5.x in repository https://gitbox.apache.org/repos/asf/mesos.git commit 0d05ffc174f90aa8573869ab36bd338224121b42 Author: Chun-Hung Hsiao AuthorDate: Wed Mar 28 22:47:52 2018 -0700 Added `--fetcher_stall_timeout` to abort stalled artifact fetching. This flag specifies a timeout for `mesos-fetcher` to wait before aborting if the download speed keeps below 1 bytes/sec. This would avoid containers to get stuck at FETCHING. Review: https://reviews.apache.org/r/65856/ --- docs/configuration/agent.md | 12 +++ include/mesos/fetcher/fetcher.proto | 3 +++ src/launcher/fetcher.cpp| 40 + src/slave/constants.hpp | 3 +++ src/slave/containerizer/fetcher.cpp | 3 +++ src/slave/flags.cpp | 9 + src/slave/flags.hpp | 1 + 7 files changed, 58 insertions(+), 13 deletions(-) diff --git a/docs/configuration/agent.md b/docs/configuration/agent.md index 3cccf89..2a72a64 100644 --- a/docs/configuration/agent.md +++ b/docs/configuration/agent.md @@ -804,6 +804,18 @@ Size of the fetcher cache in Bytes. (default: 2GB) +--fetcher_stall_timeout=VALUE + + +Amount of time for the fetcher to wait before considering a download +being too slow and abort it when the download stalls (i.e., the speed +keeps below one byte per second). +NOTE: This feature only applies when downloading data from the net and +does not apply to HDFS. (default: 1mins) + + + + --frameworks_home=VALUE diff --git a/include/mesos/fetcher/fetcher.proto b/include/mesos/fetcher/fetcher.proto index 6a5d807..d668106 100644 --- a/include/mesos/fetcher/fetcher.proto +++ b/include/mesos/fetcher/fetcher.proto @@ -64,4 +64,7 @@ message FetcherInfo { repeated Item items = 3; optional string user = 4; optional string frameworks_home = 5; + + // Only applies when fetching artifacts from the net. + optional DurationInfo stall_timeout = 6; } diff --git a/src/launcher/fetcher.cpp b/src/launcher/fetcher.cpp index e2372a1..06fa52c 100644 --- a/src/launcher/fetcher.cpp +++ b/src/launcher/fetcher.cpp @@ -165,7 +165,8 @@ static Try downloadWithHadoopClient( static Try downloadWithNet( const string& sourceUri, -const string& destinationPath) +const string& destinationPath, +const Option& stallTimeout) { // The net::download function only supports these protocols. CHECK(strings::startsWith(sourceUri, "http://";) || @@ -176,7 +177,7 @@ static Try downloadWithNet( LOG(INFO) << "Downloading resource from '" << sourceUri << "' to '" << destinationPath << "'"; - Try code = net::download(sourceUri, destinationPath); + Try code = net::download(sourceUri, destinationPath, stallTimeout); if (code.isError()) { return Error("Error downloading resource: " + code.error()); } else { @@ -217,7 +218,8 @@ static Try copyFile( static Try download( const string& _sourceUri, const string& destinationPath, -const Option& frameworksHome) +const Option& frameworksHome, +const Option& stallTimeout) { // Trim leading whitespace for 'sourceUri'. const string sourceUri = strings::trim(_sourceUri, strings::PREFIX); @@ -243,7 +245,7 @@ static Try download( // 2. Try to fetch URI using os::net / libcurl implementation. // We consider http, https, ftp, ftps compatible with libcurl. if (Fetcher::isNetUri(sourceUri)) { -return downloadWithNet(sourceUri, destinationPath); +return downloadWithNet(sourceUri, destinationPath, stallTimeout); } // 3. Try to fetch the URI using hadoop client. @@ -286,7 +288,8 @@ static Try chmodExecutable(const string& filePath) static Try fetchBypassingCache( const CommandInfo::URI& uri, const string& sandboxDirectory, -const Option& frameworksHome) +const Option& frameworksHome, +const Option& stallTimeout) { LOG(INFO) << "Fetching directly into the sandbox directory"; @@ -315,7 +318,8 @@ static Try fetchBypassingCache( string path = path::join(sandboxDirectory, outputFile.get()); - Try downloaded = download(uri.value(), path, frameworksHome); + Try downloaded = +download(uri.value(), path, frameworksHome, stallTimeout); if (downloaded.isError()) { return Error(downloaded.error()); } @@ -404,7 +408,8 @@ static Try fetchThroughCache( const FetcherInfo::Item& item, const Option& cacheDirectory, const string& sandboxDirectory, -const Option& frameworksHome) +const Option& frameworksHome, +const Option& stallTimeout) { if (cacheDirectory.isNone() || cacheDirectory.get().empty()) { return Error("Cache directory not specified"); @@ -428,7 +433,8 @@ static Try fetchThroughCache( Try downloaded = download( item.uri().value(), path::join(cacheDirectory.get(), item.cache_filename
[mesos] branch master updated: Added MESOS-8620 to the 1.5.2 CHANGELOG.
This is an automated email from the ASF dual-hosted git repository. chhsiao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/mesos.git The following commit(s) were added to refs/heads/master by this push: new 8375e42 Added MESOS-8620 to the 1.5.2 CHANGELOG. 8375e42 is described below commit 8375e426d1c07f625ee18cd439e0dd4f1dc804c5 Author: Chun-Hung Hsiao AuthorDate: Fri Oct 5 15:23:10 2018 -0700 Added MESOS-8620 to the 1.5.2 CHANGELOG. --- CHANGELOG | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGELOG b/CHANGELOG index 8756474..bb75df6 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -793,6 +793,7 @@ Release Notes - Mesos - Version 1.5.2 (WIP) * [MESOS-8418] - mesos-agent high cpu usage because of numerous /proc/mounts reads. * [MESOS-8545] - AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky. * [MESOS-8568] - Command checks should always call `WAIT_NESTED_CONTAINER` before `REMOVE_NESTED_CONTAINER` + * [MESOS-8620] - Containers stuck in FETCHING possibly due to unresponsive server. * [MESOS-8830] - Agent gc on old slave sandboxes could empty persistent volume data * [MESOS-8871] - Agent may fail to recover if the agent dies before image store cache checkpointed. * [MESOS-8904] - Master crash when removing quota.
[mesos] branch master updated: Updated verify-reviews.py to use current interpreter in subprocesses.
This is an automated email from the ASF dual-hosted git repository. vinodkone pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/mesos.git The following commit(s) were added to refs/heads/master by this push: new 568fcdf Updated verify-reviews.py to use current interpreter in subprocesses. 568fcdf is described below commit 568fcdfd29788d9df89a51ffae7969c2bf0ea173 Author: Armand Grillet AuthorDate: Mon Oct 8 15:01:28 2018 -0500 Updated verify-reviews.py to use current interpreter in subprocesses. This changes the command used in `support/verify-reviews.py` when running `support/apply-reviews.py` as a subprocess. It was previously `"python"`, which is generally Python 2, and is now `sys.executable`. That way, if verify-reviews.py is run with Python 3 (as it should), apply-reviews.py will be run with the same Python 3 interpreter. This should fix the `ImportError` issues we have recently seen in our CI. Review: https://reviews.apache.org/r/68951/ --- support/verify-reviews.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/support/verify-reviews.py b/support/verify-reviews.py index 56321ae..552dc36 100755 --- a/support/verify-reviews.py +++ b/support/verify-reviews.py @@ -94,7 +94,7 @@ def shell(command): def apply_review(review_id): """Apply a review using the script apply-reviews.py.""" print("Applying review %s" % review_id) -shell("python support/apply-reviews.py -n -r %s" % review_id) +shell("%s support/apply-reviews.py -n -r %s" % (sys.executable, review_id)) def apply_reviews(review_request, reviews, handler):
[mesos] branch master updated: Bring up the loopback interface using `iproute2`.
This is an automated email from the ASF dual-hosted git repository. gilbert pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/mesos.git The following commit(s) were added to refs/heads/master by this push: new d8de112 Bring up the loopback interface using `iproute2`. d8de112 is described below commit d8de1127bb4aa5cb36d2942cdea39e8ec620babe Author: Sergey Urbanovich AuthorDate: Mon Oct 8 16:42:41 2018 -0700 Bring up the loopback interface using `iproute2`. The last release of `net-tools` was released in 2001. The tools were deprecated years ago (see [1], [2], and [3]) and no longer installed by default in many linux-based operating systems. Bring up the loopback interface using `ip` from `iproute2` and if it fails to start fall back on `ifconfig` from `net-tools`. [1] https://lists.debian.org/debian-devel/2009/03/msg00780.html [2] https://bugzilla.redhat.com/show_bug.cgi?id=687920 [3] https://lwn.net/Articles/710533/ Review: https://reviews.apache.org/r/68921/ --- .../containerizer/mesos/isolators/network/cni/cni.cpp | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/src/slave/containerizer/mesos/isolators/network/cni/cni.cpp b/src/slave/containerizer/mesos/isolators/network/cni/cni.cpp index ba46552..64271df 100644 --- a/src/slave/containerizer/mesos/isolators/network/cni/cni.cpp +++ b/src/slave/containerizer/mesos/isolators/network/cni/cni.cpp @@ -2252,21 +2252,31 @@ int NetworkCniIsolatorSetup::execute() return EXIT_FAILURE; } -const Option status = os::spawn("ifconfig", {"ifconfig", "lo", "up"}); +// TODO(urbanserj): To get rid of all external dependencies such as +// `iproute2` and `net-tools`, use Netlink Protocol Library (libnl). +Option status = os::spawn( +"ip", {"ip", "link", "set", "dev", "lo", "up"}); const string message = "Failed to bring up the loopback interface in the new " "network namespace of pid " + stringify(flags.pid.get()); if (status.isNone()) { - cerr << message << ": " << "os::spawn failed: " + cerr << message << ": os::spawn 'ip link set dev lo up' failed: " + << os::strerror(errno) << endl; + + // Fall back on `ifconfig` if `ip` command fails to start. + status = os::spawn("ifconfig", {"ifconfig", "lo", "up"}); +} + +if (status.isNone()) { + cerr << message << ": os::spawn 'ifconfig lo up' failed: " << os::strerror(errno) << endl; return EXIT_FAILURE; } if (!WSUCCEEDED(status.get())) { - cerr << message << ": 'ifconfig lo up' " - << WSTRINGIFY(status.get()) << endl; + cerr << message << ": " << WSTRINGIFY(status.get()) << endl; return EXIT_FAILURE; } }
[mesos] branch master updated: Added a 1.7.0 performance improvements blog post.
This is an automated email from the ASF dual-hosted git repository. bmahler pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/mesos.git The following commit(s) were added to refs/heads/master by this push: new d0a4a08 Added a 1.7.0 performance improvements blog post. d0a4a08 is described below commit d0a4a08a3510722a17233d06661acbb063493db9 Author: Benjamin Mahler AuthorDate: Fri Oct 5 15:56:08 2018 -0700 Added a 1.7.0 performance improvements blog post. Review: https://reviews.apache.org/r/68940 --- ...7-performance-improvements-allocation-cycle.png | Bin 0 -> 100114 bytes ...7-performance-improvements-container-launch.png | Bin 0 -> 80005 bytes ...ce-improvements-containers-endpoint-latency.png | Bin 0 -> 93369 bytes ...ance-improvements-containers-endpoint-tasks.png | Bin 0 -> 75220 bytes ...1.7-performance-improvements-parallel-state.png | Bin 0 -> 404766 bytes .../1.7-performance-improvements-rapidjson.png | Bin 0 -> 92573 bytes ...8-10-08-mesos-1-7-0-performance-improvements.md | 115 + 7 files changed, 115 insertions(+) diff --git a/site/source/assets/img/blog/1.7-performance-improvements-allocation-cycle.png b/site/source/assets/img/blog/1.7-performance-improvements-allocation-cycle.png new file mode 100644 index 000..1a0eaea Binary files /dev/null and b/site/source/assets/img/blog/1.7-performance-improvements-allocation-cycle.png differ diff --git a/site/source/assets/img/blog/1.7-performance-improvements-container-launch.png b/site/source/assets/img/blog/1.7-performance-improvements-container-launch.png new file mode 100644 index 000..60ebc3f Binary files /dev/null and b/site/source/assets/img/blog/1.7-performance-improvements-container-launch.png differ diff --git a/site/source/assets/img/blog/1.7-performance-improvements-containers-endpoint-latency.png b/site/source/assets/img/blog/1.7-performance-improvements-containers-endpoint-latency.png new file mode 100644 index 000..20af929 Binary files /dev/null and b/site/source/assets/img/blog/1.7-performance-improvements-containers-endpoint-latency.png differ diff --git a/site/source/assets/img/blog/1.7-performance-improvements-containers-endpoint-tasks.png b/site/source/assets/img/blog/1.7-performance-improvements-containers-endpoint-tasks.png new file mode 100644 index 000..de94b7a Binary files /dev/null and b/site/source/assets/img/blog/1.7-performance-improvements-containers-endpoint-tasks.png differ diff --git a/site/source/assets/img/blog/1.7-performance-improvements-parallel-state.png b/site/source/assets/img/blog/1.7-performance-improvements-parallel-state.png new file mode 100644 index 000..1a81a7c Binary files /dev/null and b/site/source/assets/img/blog/1.7-performance-improvements-parallel-state.png differ diff --git a/site/source/assets/img/blog/1.7-performance-improvements-rapidjson.png b/site/source/assets/img/blog/1.7-performance-improvements-rapidjson.png new file mode 100644 index 000..b453c37 Binary files /dev/null and b/site/source/assets/img/blog/1.7-performance-improvements-rapidjson.png differ diff --git a/site/source/blog/2018-10-08-mesos-1-7-0-performance-improvements.md b/site/source/blog/2018-10-08-mesos-1-7-0-performance-improvements.md new file mode 100644 index 000..6780322 --- /dev/null +++ b/site/source/blog/2018-10-08-mesos-1-7-0-performance-improvements.md @@ -0,0 +1,115 @@ +--- +layout: post +title: Performance Improvements in Mesos 1.7.0 +published: true +post_author: + display_name: Benjamin Mahler + gravatar: fb43656d4d45f940160c3226c53309f5 + twitter: bmahler +tags: Performance +--- + +**Scalability and performance are key features for Mesos. Some users of Mesos already run production clusters that consist of many tens of thousands of nodes.** However, there remains a lot of room for improvement across a variety of areas of the system. + +The Mesos community has been working hard over the past few months to address several performance issues that have been affecting users. The following are some of the key performance improvements included in Mesos 1.7.0: + +* **Master `/state` endpoint:** Adopted [RapidJSON](http://rapidjson.org/) and reduced copying for a 2.3x throughput improvement due to a ~55% decrease in latency ([MESOS-9092](https://issues.apache.org/jira/browse/MESOS-9092)). Also, added parallel processing of `/state` requests to reduce master backlogging / interference under high request load ([MESOS-9122](https://issues.apache.org/jira/browse/MESOS-9122)). +* **Allocator:** in 1.7.1 (these patches did not make 1.7.0 and were backported to 1.7.x), allocation cycle time was reduced. Some benchmarks show an 80% reduction. This, together with the reduced master backlogging from `/state` improvements, substantially reduces the end-to-end offer cycling time between Mesos and schedulers. +* **Agent `/containers` endpoint:** Fixed a performance issue t