[mesos] branch master updated: Removed unused `lib/cli/tasks.py` for new CLI.

2018-10-08 Thread klueska
This is an automated email from the ASF dual-hosted git repository.

klueska pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/master by this push:
 new fee4803  Removed unused `lib/cli/tasks.py` for new CLI.
fee4803 is described below

commit fee4803635776b8401a6deea3fd9b3d4133d30bc
Author: Armand Grillet 
AuthorDate: Mon Oct 8 09:29:16 2018 -0400

Removed unused `lib/cli/tasks.py` for new CLI.

This file was introduced accidentally as part of
051a138d08ba3b9e28fd6ec4e4f707cbd4bb1563

Review: https://reviews.apache.org/r/68949/
---
 src/python/cli_new/lib/cli/tasks.py | 33 -
 1 file changed, 33 deletions(-)

diff --git a/src/python/cli_new/lib/cli/tasks.py 
b/src/python/cli_new/lib/cli/tasks.py
deleted file mode 100644
index 531e001..000
--- a/src/python/cli_new/lib/cli/tasks.py
+++ /dev/null
@@ -1,33 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""
-Functions to handle tasks.
-"""
-
-from cli import http
-from cli.exceptions import CLIException
-
-def get_tasks(master):
-"""
-Get the tasks in a Mesos cluster.
-"""
-try:
-return http.get_json(master, "tasks")["tasks"]
-except Exception as exception:
-raise CLIException("Could not open '/tasks'"
-   " endpoint at '{addr}': {error}"
-   .format(addr=master, error=exception))



[mesos] branch master updated: Moved `get_agent_address` from `util.py` to `mesos.py` in new CLI.

2018-10-08 Thread klueska
This is an automated email from the ASF dual-hosted git repository.

klueska pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/master by this push:
 new aa98a49  Moved `get_agent_address` from `util.py` to `mesos.py` in new 
CLI.
aa98a49 is described below

commit aa98a4918eab5c46383d352aee16e0b8ed4e2b13
Author: Armand Grillet 
AuthorDate: Mon Oct 8 09:33:07 2018 -0400

Moved `get_agent_address` from `util.py` to `mesos.py` in new CLI.

Review: https://reviews.apache.org/r/68950/
---
 src/python/cli_new/lib/cli/mesos.py | 19 +++
 src/python/cli_new/lib/cli/util.py  | 20 
 2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/src/python/cli_new/lib/cli/mesos.py 
b/src/python/cli_new/lib/cli/mesos.py
index 068d694..7cf84bc 100644
--- a/src/python/cli_new/lib/cli/mesos.py
+++ b/src/python/cli_new/lib/cli/mesos.py
@@ -22,6 +22,24 @@ from cli import http
 from cli.exceptions import CLIException
 
 
+def get_agent_address(agent_id, master):
+"""
+Given a master and an agent id, return the agent address
+by checking the /slaves endpoint of the master.
+"""
+try:
+agents = http.get_json(master, "slaves")["slaves"]
+except Exception as exception:
+raise CLIException("Could not open '/slaves'"
+   " endpoint at '{addr}': {error}"
+   .format(addr=master,
+   error=exception))
+for agent in agents:
+if agent["id"] == agent_id:
+return agent["pid"].split("@")[1]
+raise CLIException("Unable to find agent '{id}'".format(id=agent_id))
+
+
 def get_agents(master):
 """
 Get the agents in a Mesos cluster.
@@ -44,6 +62,7 @@ def get_agents(master):
 
 return data[key]
 
+
 def get_tasks(master):
 """
 Get the tasks in a Mesos cluster.
diff --git a/src/python/cli_new/lib/cli/util.py 
b/src/python/cli_new/lib/cli/util.py
index 7cec7e4..e79268d 100644
--- a/src/python/cli_new/lib/cli/util.py
+++ b/src/python/cli_new/lib/cli/util.py
@@ -28,8 +28,6 @@ import textwrap
 
 from kazoo.client import KazooClient
 
-from cli import http
-
 from cli.exceptions import CLIException
 
 
@@ -186,24 +184,6 @@ def verify_address_format(address):
 raise CLIException("The port '{port}' is not valid")
 
 
-def get_agent_address(agent_id, master):
-"""
-Given a master and an agent id, return the agent address
-by checking the /slaves endpoint of the master.
-"""
-try:
-agents = http.get_json(master, "slaves")["slaves"]
-except Exception as exception:
-raise CLIException("Could not open '/slaves'"
-   " endpoint at '{addr}': {error}"
-   .format(addr=master,
-   error=exception))
-for agent in agents:
-if agent["id"] == agent_id:
-return agent["pid"].split("@")[1]
-raise CLIException("Unable to find agent '{id}'".format(id=agent_id))
-
-
 def join_plugin_paths(settings, config):
 """
 Return all the plugin paths combined



[mesos] 02/02: Fused constructors of `MethodNotAllowed` into one.

2018-10-08 Thread alexr
This is an automated email from the ASF dual-hosted git repository.

alexr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 17c1d7d41a5489cc865f787deeeb2b91865b8fe1
Author: Alexander Rukletsov 
AuthorDate: Sun Oct 7 16:31:56 2018 +0200

Fused constructors of `MethodNotAllowed` into one.

There is no good reason to provide two c-tors for `MethodNotAllowed`,
with one taking `requestMethod` and one not. Instead, an `Option<>`
can be used. This also removes the need for copy-paste in the c-tor
body.

Review: https://reviews.apache.org/r/68945
---
 3rdparty/libprocess/include/process/http.hpp | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/3rdparty/libprocess/include/process/http.hpp 
b/3rdparty/libprocess/include/process/http.hpp
index 00dc2fd..bbcd0ba 100644
--- a/3rdparty/libprocess/include/process/http.hpp
+++ b/3rdparty/libprocess/include/process/http.hpp
@@ -753,16 +753,9 @@ struct MethodNotAllowed : Response
   // According to RFC 2616, "An Allow header field MUST be present in a
   // 405 (Method Not Allowed) response".
 
-  explicit MethodNotAllowed(
-  const std::initializer_list& allowedMethods)
-: Response("405 Method Not Allowed.", Status::METHOD_NOT_ALLOWED)
-  {
-headers["Allow"] = strings::join(", ", allowedMethods);
-  }
-
   MethodNotAllowed(
   const std::initializer_list& allowedMethods,
-  const std::string& requestMethod)
+  const Option& requestMethod = None())
 : Response(
 constructBody(allowedMethods, requestMethod),
 Status::METHOD_NOT_ALLOWED)
@@ -773,11 +766,15 @@ struct MethodNotAllowed : Response
 private:
   static std::string constructBody(
   const std::initializer_list& allowedMethods,
-  const std::string& requestMethod)
+  const Option& requestMethod)
   {
-return "405 Method Not Allowed. Expecting one of { '" +
- strings::join("', '", allowedMethods) + "' }, but received '" +
- requestMethod + "'.";
+return
+"405 Method Not Allowed. Expecting one of { '" +
+strings::join("', '", allowedMethods) + "' }" +
+(requestMethod.isSome()
+   ? ", but received '" + requestMethod.get() + "'"
+   : "") +
+".";
   }
 };
 



[mesos] 01/02: Used delegating constructors in `Response` types.

2018-10-08 Thread alexr
This is an automated email from the ASF dual-hosted git repository.

alexr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 4bf72d9851e924589ae769c9706eae0e82f2a254
Author: Alexander Rukletsov 
AuthorDate: Sun Oct 7 15:55:13 2018 +0200

Used delegating constructors in `Response` types.

For clarity and brevity, use delegating constructors (available
since C++11) in descendants of the `Response` class.

Review: https://reviews.apache.org/r/68944
---
 3rdparty/libprocess/include/process/http.hpp | 29 +++-
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/3rdparty/libprocess/include/process/http.hpp 
b/3rdparty/libprocess/include/process/http.hpp
index cef511a..00dc2fd 100644
--- a/3rdparty/libprocess/include/process/http.hpp
+++ b/3rdparty/libprocess/include/process/http.hpp
@@ -702,7 +702,7 @@ struct TemporaryRedirect : Response
 struct BadRequest : Response
 {
   BadRequest()
-: Response("400 Bad Request.", Status::BAD_REQUEST) {}
+: BadRequest("400 Bad Request.") {}
 
   explicit BadRequest(const std::string& body)
 : Response(body, Status::BAD_REQUEST) {}
@@ -712,14 +712,7 @@ struct BadRequest : Response
 struct Unauthorized : Response
 {
   explicit Unauthorized(const std::vector& challenges)
-: Response("401 Unauthorized.", Status::UNAUTHORIZED)
-  {
-// TODO(arojas): Many HTTP client implementations do not support
-// multiple challenges within a single 'WWW-Authenticate' header.
-// Once MESOS-3306 is fixed, we can use multiple entries for the
-// same header.
-headers["WWW-Authenticate"] = strings::join(", ", challenges);
-  }
+: Unauthorized(challenges, "401 Unauthorized.") {}
 
   Unauthorized(
   const std::vector& challenges,
@@ -738,7 +731,7 @@ struct Unauthorized : Response
 struct Forbidden : Response
 {
   Forbidden()
-: Response("403 Forbidden.", Status::FORBIDDEN) {}
+: Forbidden("403 Forbidden.") {}
 
   explicit Forbidden(const std::string& body)
 : Response(body, Status::FORBIDDEN) {}
@@ -748,7 +741,7 @@ struct Forbidden : Response
 struct NotFound : Response
 {
   NotFound()
-: Response("404 Not Found.", Status::NOT_FOUND) {}
+: NotFound("404 Not Found.") {}
 
   explicit NotFound(const std::string& body)
 : Response(body, Status::NOT_FOUND) {}
@@ -792,7 +785,7 @@ private:
 struct NotAcceptable : Response
 {
   NotAcceptable()
-: Response("406 Not Acceptable.", Status::NOT_ACCEPTABLE) {}
+: NotAcceptable("406 Not Acceptable.") {}
 
   explicit NotAcceptable(const std::string& body)
 : Response(body, Status::NOT_ACCEPTABLE) {}
@@ -802,7 +795,7 @@ struct NotAcceptable : Response
 struct Conflict : Response
 {
   Conflict()
-: Response("409 Conflict.", Status::CONFLICT) {}
+: Conflict("409 Conflict.") {}
 
   explicit Conflict(const std::string& body)
 : Response(body, Status::CONFLICT) {}
@@ -812,7 +805,7 @@ struct Conflict : Response
 struct PreconditionFailed : Response
 {
   PreconditionFailed()
-: Response("412 Precondition Failed.", Status::PRECONDITION_FAILED) {}
+: PreconditionFailed("412 Precondition Failed.") {}
 
   explicit PreconditionFailed(const std::string& body)
 : Response(body, Status::PRECONDITION_FAILED) {}
@@ -822,7 +815,7 @@ struct PreconditionFailed : Response
 struct UnsupportedMediaType : Response
 {
   UnsupportedMediaType()
-: Response("415 Unsupported Media Type.", Status::UNSUPPORTED_MEDIA_TYPE) 
{}
+: UnsupportedMediaType("415 Unsupported Media Type.") {}
 
   explicit UnsupportedMediaType(const std::string& body)
 : Response(body, Status::UNSUPPORTED_MEDIA_TYPE) {}
@@ -832,7 +825,7 @@ struct UnsupportedMediaType : Response
 struct InternalServerError : Response
 {
   InternalServerError()
-: Response("500 Internal Server Error.", Status::INTERNAL_SERVER_ERROR) {}
+: InternalServerError("500 Internal Server Error.") {}
 
   explicit InternalServerError(const std::string& body)
 : Response(body, Status::INTERNAL_SERVER_ERROR) {}
@@ -842,7 +835,7 @@ struct InternalServerError : Response
 struct NotImplemented : Response
 {
   NotImplemented()
-: Response("501 Not Implemented.", Status::NOT_IMPLEMENTED) {}
+: NotImplemented("501 Not Implemented.") {}
 
   explicit NotImplemented(const std::string& body)
 : Response(body, Status::NOT_IMPLEMENTED) {}
@@ -852,7 +845,7 @@ struct NotImplemented : Response
 struct ServiceUnavailable : Response
 {
   ServiceUnavailable()
-: Response("503 Service Unavailable.", Status::SERVICE_UNAVAILABLE) {}
+: ServiceUnavailable("503 Service Unavailable.") {}
 
   explicit ServiceUnavailable(const std::string& body)
 : Response(body, Status::SERVICE_UNAVAILABLE) {}



[mesos] branch master updated (aa98a49 -> 17c1d7d)

2018-10-08 Thread alexr
This is an automated email from the ASF dual-hosted git repository.

alexr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from aa98a49  Moved `get_agent_address` from `util.py` to `mesos.py` in new 
CLI.
 new 4bf72d9  Used delegating constructors in `Response` types.
 new 17c1d7d  Fused constructors of `MethodNotAllowed` into one.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 3rdparty/libprocess/include/process/http.hpp | 50 +++-
 1 file changed, 20 insertions(+), 30 deletions(-)



[mesos] 02/03: Added an unit test for agent recovery with new cgroup subsystems.

2018-10-08 Thread gilbert
This is an automated email from the ASF dual-hosted git repository.

gilbert pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 36a64c869cb04704047b86d3f8d11f1399aa8a8c
Author: Gilbert Song 
AuthorDate: Fri Oct 5 12:19:01 2018 -0700

Added an unit test for agent recovery with new cgroup subsystems.

Review: https://reviews.apache.org/r/68941
---
 src/tests/containerizer/cgroups_isolator_tests.cpp | 147 +
 1 file changed, 147 insertions(+)

diff --git a/src/tests/containerizer/cgroups_isolator_tests.cpp 
b/src/tests/containerizer/cgroups_isolator_tests.cpp
index 368ab93..fccab20 100644
--- a/src/tests/containerizer/cgroups_isolator_tests.cpp
+++ b/src/tests/containerizer/cgroups_isolator_tests.cpp
@@ -1904,6 +1904,153 @@ TEST_F(CgroupsIsolatorTest, 
ROOT_CGROUPS_AutoLoadSubsystems)
 }
 
 
+// This test verifies that after the agent recovery/upgrade, nested
+// containers could still be launched under old containers which
+// were launched before agent restarts if there are new cgroup
+// subsystems are added in the agent cgroup isolation.
+TEST_F(CgroupsIsolatorTest, ROOT_CGROUPS_AgentRecoveryWithNewCgroupSubsystems)
+{
+  // Disable AuthN on the agent.
+  slave::Flags flags = CreateSlaveFlags();
+  flags.isolation = "filesystem/linux,docker/runtime,cgroups/mem";
+  flags.image_providers = "docker";
+  flags.authenticate_http_readwrite = false;
+
+  Try> master = StartMaster();
+  ASSERT_SOME(master);
+
+  Owned detector = master.get()->createDetector();
+
+  // Start the slave with a static process ID. This allows the executor to
+  // reconnect with the slave upon a process restart.
+  const string id("agent");
+
+  Try> slave = StartSlave(detector.get(), id, flags);
+  ASSERT_SOME(slave);
+
+  auto scheduler = std::make_shared();
+
+  v1::FrameworkInfo frameworkInfo = v1::DEFAULT_FRAMEWORK_INFO;
+  frameworkInfo.set_checkpoint(true);
+
+  EXPECT_CALL(*scheduler, connected(_))
+.WillOnce(v1::scheduler::SendSubscribe(frameworkInfo));
+
+  Future subscribed;
+  EXPECT_CALL(*scheduler, subscribed(_, _))
+.WillOnce(FutureArg<1>(&subscribed));
+
+  Future offers1;
+  EXPECT_CALL(*scheduler, offers(_, _))
+.WillOnce(FutureArg<1>(&offers1))
+.WillRepeatedly(Return());
+
+  EXPECT_CALL(*scheduler, heartbeat(_))
+.WillRepeatedly(Return()); // Ignore heartbeats.
+
+  v1::scheduler::TestMesos mesos(
+  master.get()->pid, ContentType::PROTOBUF, scheduler);
+
+  AWAIT_READY(subscribed);
+  v1::FrameworkID frameworkId(subscribed->framework_id());
+
+  v1::ExecutorInfo executorInfo = v1::createExecutorInfo(
+  "test_default_executor",
+  None(),
+  "cpus:0.1;mem:32;disk:32",
+  v1::ExecutorInfo::DEFAULT);
+
+  // Update `executorInfo` with the subscribed `frameworkId`.
+  executorInfo.mutable_framework_id()->CopyFrom(frameworkId);
+
+  AWAIT_READY(offers1);
+  ASSERT_FALSE(offers1->offers().empty());
+
+  const v1::Offer& offer1 = offers1->offers(0);
+
+  v1::TaskInfo taskInfo1 = v1::createTask(
+  offer1.agent_id(),
+  v1::Resources::parse("cpus:0.1;mem:32;disk:32").get(),
+  "sleep 1000");
+
+  Future startingUpdate1;
+  Future runningUpdate1;
+  EXPECT_CALL(*scheduler, update(_, _))
+.WillOnce(DoAll(
+FutureArg<1>(&startingUpdate1),
+v1::scheduler::SendAcknowledge(frameworkId, offer1.agent_id(
+.WillOnce(DoAll(
+FutureArg<1>(&runningUpdate1),
+v1::scheduler::SendAcknowledge(frameworkId, offer1.agent_id(
+.WillRepeatedly(Return());
+
+  mesos.send(
+  v1::createCallAccept(
+  frameworkId,
+  offer1,
+  {v1::LAUNCH_GROUP(
+  executorInfo, v1::createTaskGroupInfo({taskInfo1}))}));
+
+  AWAIT_READY(startingUpdate1);
+  ASSERT_EQ(v1::TASK_STARTING, startingUpdate1->status().state());
+  ASSERT_EQ(taskInfo1.task_id(), startingUpdate1->status().task_id());
+
+  AWAIT_READY(runningUpdate1);
+  ASSERT_EQ(v1::TASK_RUNNING, runningUpdate1->status().state());
+  ASSERT_EQ(taskInfo1.task_id(), runningUpdate1->status().task_id());
+
+  slave.get()->terminate();
+  slave->reset();
+
+  Future __recover = FUTURE_DISPATCH(_, &Slave::__recover);
+
+  // Update the cgroup isolation to introduce new subsystems.
+  flags.isolation = "filesystem/linux,docker/runtime,cgroups/all";
+  slave = this->StartSlave(detector.get(), id, flags);
+  ASSERT_SOME(slave);
+
+  AWAIT_READY(__recover);
+
+  Future offers2;
+  EXPECT_CALL(*scheduler, offers(_, _))
+.WillOnce(FutureArg<1>(&offers2))
+.WillRepeatedly(Return());
+
+  AWAIT_READY(offers2);
+  ASSERT_FALSE(offers2->offers().empty());
+
+  const v1::Offer& offer2 = offers2->offers(0);
+
+  v1::TaskInfo taskInfo2 = v1::createTask(
+  offer2.agent_id(),
+  v1::Resources::parse("cpus:0.1;mem:32;disk:32").get(),
+  "sleep 1000");
+
+  Future startingUpdate2;
+  Future runningUpdate2;
+  EXPECT_CALL(*scheduler, update(_, _))
+.WillOnce(DoAll(
+Fut

[mesos] 01/03: Fixed the nested container launch failure on the agent upgrade case.

2018-10-08 Thread gilbert
This is an automated email from the ASF dual-hosted git repository.

gilbert pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 200a532d33b647cc26d9566bbc1765bc039e699d
Author: Gilbert Song 
AuthorDate: Thu Oct 4 16:54:24 2018 -0700

Fixed the nested container launch failure on the agent upgrade case.

If there are new cgroup subsystems are added after the agent upgrad
or recovery, new nested container launched under old containers that
are launched before the recovery would fail, because it cannot assign
its pid to the non-existed cgroup hierarchy. We should skip those
new cgroup subsystems for nested containers under old containers.

Review: https://reviews.apache.org/r/68929
---
 .../mesos/isolators/cgroups/cgroups.cpp| 34 +-
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp 
b/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp
index 11dfbab..fbb1b43 100644
--- a/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp
+++ b/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp
@@ -340,10 +340,13 @@ Future CgroupsIsolatorProcess::___recover(
   // TODO(haosdent): Use foreachkey once MESOS-5037 is resolved.
   foreach (const string& hierarchy, subsystems.keys()) {
 if (!cgroups::exists(hierarchy, cgroup)) {
-  // This may occur if the executor has exited and the isolator
-  // has destroyed the cgroup but the agent dies before noticing
-  // this. This will be detected when the containerizer tries to
-  // monitor the executor's pid.
+  // This may occur in two cases:
+  // 1. If the executor has exited and the isolator has destroyed
+  //the cgroup but the agent dies before noticing this. This
+  //will be detected when the containerizer tries to monitor
+  //the executor's pid.
+  // 2. After the agent recovery/upgrade, new cgroup subsystems
+  //are added to the agent cgroup isolation configuration.
   LOG(WARNING) << "Couldn't find the cgroup '" << cgroup << "' "
<< "in hierarchy '" << hierarchy << "' "
<< "for container " << containerId;
@@ -677,18 +680,33 @@ Future CgroupsIsolatorProcess::isolate(
 return Failure("Failed to isolate the container: Unknown root container");
   }
 
+  const string& cgroup = infos[rootContainerId]->cgroup;
+
   // TODO(haosdent): Use foreachkey once MESOS-5037 is resolved.
   foreach (const string& hierarchy, subsystems.keys()) {
+// If new cgroup subsystems are added after the agent
+// upgrade, the newly added cgroup subsystems do not
+// exist on old container's cgroup hierarchy. So skip
+// assigning the pid to this cgroup subsystem.
+if (containerId.has_parent() && !cgroups::exists(hierarchy, cgroup)) {
+  LOG(INFO) << "Skipping assigning pid " << stringify(pid)
+<< " to cgroup at '" << path::join(hierarchy, cgroup)
+<< "' for container " << containerId
+<< " because its parent container " << containerId.parent()
+<< " does not have this cgroup hierarchy";
+  continue;
+}
+
 Try assign = cgroups::assign(
 hierarchy,
-infos[rootContainerId]->cgroup,
+cgroup,
 pid);
 
 if (assign.isError()) {
   string message =
-"Failed to assign pid " + stringify(pid) + " to cgroup at "
-"'" + path::join(hierarchy, infos[rootContainerId]->cgroup) + "'"
-": " + assign.error();
+"Failed to assign container " + stringify(containerId) +
+" pid " + stringify(pid) + " to cgroup at '" +
+path::join(hierarchy, cgroup) + "': " + assign.error();
 
   LOG(ERROR) << message;
 



[mesos] 03/03: Added MESOS-9295 to 1.7.1 CHANGELOG.

2018-10-08 Thread gilbert
This is an automated email from the ASF dual-hosted git repository.

gilbert pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit e135b7f2175c01fd67b71b22cdc325ac37853a9d
Author: Gilbert Song 
AuthorDate: Mon Oct 8 10:34:27 2018 -0700

Added MESOS-9295 to 1.7.1 CHANGELOG.
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index 6a47201..8756474 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -15,6 +15,7 @@ Release Notes - Mesos - Version 1.7.1 (WIP)
   * [MESOS-9274] - v1 JAVA scheduler library can drop TEARDOWN upon 
destruction.
   * [MESOS-9279] - Docker Containerizer 'usage' call might be expensive if 
mount table is big.
   * [MESOS-9283] - Docker containerizer actor can get backlogged with large 
number of containers.
+  * [MESOS-9295] - Nested container launch could fail if the agent upgrade 
with new cgroup subsystems.
 
 ** Improvement:
   * [MESOS-6765] - Make the Resources wrapper "copy-on-write" to improve 
performance.



[mesos] branch master updated (17c1d7d -> e135b7f)

2018-10-08 Thread gilbert
This is an automated email from the ASF dual-hosted git repository.

gilbert pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from 17c1d7d  Fused constructors of `MethodNotAllowed` into one.
 new 200a532  Fixed the nested container launch failure on the agent 
upgrade case.
 new 36a64c8  Added an unit test for agent recovery with new cgroup 
subsystems.
 new e135b7f  Added MESOS-9295 to 1.7.1 CHANGELOG.

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGELOG  |   1 +
 .../mesos/isolators/cgroups/cgroups.cpp|  34 +++--
 src/tests/containerizer/cgroups_isolator_tests.cpp | 147 +
 3 files changed, 174 insertions(+), 8 deletions(-)



[mesos] 02/02: Added MESOS-9295 to 1.7.1 CHANGELOG.

2018-10-08 Thread gilbert
This is an automated email from the ASF dual-hosted git repository.

gilbert pushed a commit to branch 1.7.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 3e185f14075b827b8ea3b03a48ed2cd136ce8158
Author: Gilbert Song 
AuthorDate: Mon Oct 8 10:34:27 2018 -0700

Added MESOS-9295 to 1.7.1 CHANGELOG.

(cherry picked from commit e135b7f2175c01fd67b71b22cdc325ac37853a9d)
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index 268..75be171 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -15,6 +15,7 @@ Release Notes - Mesos - Version 1.7.1 (WIP)
   * [MESOS-9274] - v1 JAVA scheduler library can drop TEARDOWN upon 
destruction.
   * [MESOS-9279] - Docker Containerizer 'usage' call might be expensive if 
mount table is big.
   * [MESOS-9283] - Docker containerizer actor can get backlogged with large 
number of containers.
+  * [MESOS-9295] - Nested container launch could fail if the agent upgrade 
with new cgroup subsystems.
 
 ** Improvement:
   * [MESOS-6765] - Make the Resources wrapper "copy-on-write" to improve 
performance.



[mesos] branch 1.7.x updated (1ecc3c6 -> 3e185f1)

2018-10-08 Thread gilbert
This is an automated email from the ASF dual-hosted git repository.

gilbert pushed a change to branch 1.7.x
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from 1ecc3c6  Added MESOS-9283 to the 1.7.x CHANGELOG.
 new e9a2d1a  Fixed the nested container launch failure on the agent 
upgrade case.
 new 3e185f1  Added MESOS-9295 to 1.7.1 CHANGELOG.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGELOG  |  1 +
 .../mesos/isolators/cgroups/cgroups.cpp| 34 +-
 2 files changed, 27 insertions(+), 8 deletions(-)



[mesos] 01/02: Fixed the nested container launch failure on the agent upgrade case.

2018-10-08 Thread gilbert
This is an automated email from the ASF dual-hosted git repository.

gilbert pushed a commit to branch 1.7.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit e9a2d1a7dbba1e7900417461a935b284243e79a4
Author: Gilbert Song 
AuthorDate: Thu Oct 4 16:54:24 2018 -0700

Fixed the nested container launch failure on the agent upgrade case.

If there are new cgroup subsystems are added after the agent upgrad
or recovery, new nested container launched under old containers that
are launched before the recovery would fail, because it cannot assign
its pid to the non-existed cgroup hierarchy. We should skip those
new cgroup subsystems for nested containers under old containers.

Review: https://reviews.apache.org/r/68929
(cherry picked from commit 200a532d33b647cc26d9566bbc1765bc039e699d)
---
 .../mesos/isolators/cgroups/cgroups.cpp| 34 +-
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp 
b/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp
index 11dfbab..fbb1b43 100644
--- a/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp
+++ b/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp
@@ -340,10 +340,13 @@ Future CgroupsIsolatorProcess::___recover(
   // TODO(haosdent): Use foreachkey once MESOS-5037 is resolved.
   foreach (const string& hierarchy, subsystems.keys()) {
 if (!cgroups::exists(hierarchy, cgroup)) {
-  // This may occur if the executor has exited and the isolator
-  // has destroyed the cgroup but the agent dies before noticing
-  // this. This will be detected when the containerizer tries to
-  // monitor the executor's pid.
+  // This may occur in two cases:
+  // 1. If the executor has exited and the isolator has destroyed
+  //the cgroup but the agent dies before noticing this. This
+  //will be detected when the containerizer tries to monitor
+  //the executor's pid.
+  // 2. After the agent recovery/upgrade, new cgroup subsystems
+  //are added to the agent cgroup isolation configuration.
   LOG(WARNING) << "Couldn't find the cgroup '" << cgroup << "' "
<< "in hierarchy '" << hierarchy << "' "
<< "for container " << containerId;
@@ -677,18 +680,33 @@ Future CgroupsIsolatorProcess::isolate(
 return Failure("Failed to isolate the container: Unknown root container");
   }
 
+  const string& cgroup = infos[rootContainerId]->cgroup;
+
   // TODO(haosdent): Use foreachkey once MESOS-5037 is resolved.
   foreach (const string& hierarchy, subsystems.keys()) {
+// If new cgroup subsystems are added after the agent
+// upgrade, the newly added cgroup subsystems do not
+// exist on old container's cgroup hierarchy. So skip
+// assigning the pid to this cgroup subsystem.
+if (containerId.has_parent() && !cgroups::exists(hierarchy, cgroup)) {
+  LOG(INFO) << "Skipping assigning pid " << stringify(pid)
+<< " to cgroup at '" << path::join(hierarchy, cgroup)
+<< "' for container " << containerId
+<< " because its parent container " << containerId.parent()
+<< " does not have this cgroup hierarchy";
+  continue;
+}
+
 Try assign = cgroups::assign(
 hierarchy,
-infos[rootContainerId]->cgroup,
+cgroup,
 pid);
 
 if (assign.isError()) {
   string message =
-"Failed to assign pid " + stringify(pid) + " to cgroup at "
-"'" + path::join(hierarchy, infos[rootContainerId]->cgroup) + "'"
-": " + assign.error();
+"Failed to assign container " + stringify(containerId) +
+" pid " + stringify(pid) + " to cgroup at '" +
+path::join(hierarchy, cgroup) + "': " + assign.error();
 
   LOG(ERROR) << message;
 



[mesos] 03/04: Enabled `--fetch_stall_timeout` in curl-based URI fetcher plugins.

2018-10-08 Thread chhsiao
This is an automated email from the ASF dual-hosted git repository.

chhsiao pushed a commit to branch 1.5.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 97f73a9e844f9f37d54a97c8993dbf05cffc9592
Author: Chun-Hung Hsiao 
AuthorDate: Wed Mar 28 22:47:58 2018 -0700

Enabled `--fetch_stall_timeout` in curl-based URI fetcher plugins.

This patch passes the `--fetch_stall_timeout` agent flag into
`DockerFetcherPlugin` (through setting flag `docker_stall_timeout` in
the Docker store) and `CurlFetcherPlugin` (through setting flag
`curl_stall_timeout` in the Appc store).

Review: https://reviews.apache.org/r/65876/
---
 .../containerizer/mesos/provisioner/appc/store.cpp |  5 +-
 .../mesos/provisioner/docker/store.cpp |  1 +
 src/uri/fetchers/curl.cpp  | 23 +++-
 src/uri/fetchers/curl.hpp  | 12 +++-
 src/uri/fetchers/docker.cpp| 64 --
 src/uri/fetchers/docker.hpp|  1 +
 6 files changed, 85 insertions(+), 21 deletions(-)

diff --git a/src/slave/containerizer/mesos/provisioner/appc/store.cpp 
b/src/slave/containerizer/mesos/provisioner/appc/store.cpp
index c1f9661..f30c166 100644
--- a/src/slave/containerizer/mesos/provisioner/appc/store.cpp
+++ b/src/slave/containerizer/mesos/provisioner/appc/store.cpp
@@ -131,7 +131,10 @@ Try> Store::create(
   // TODO(jojy): Uri fetcher has 'shared' semantics for the
   // provisioner. It's a shared pointer which needs to be injected
   // from top level into the store (instead of being created here).
-  Try> uriFetcher = uri::fetcher::create();
+  uri::fetcher::Flags _flags;
+  _flags.curl_stall_timeout = flags.fetcher_stall_timeout;
+
+  Try> uriFetcher = uri::fetcher::create(_flags);
   if (uriFetcher.isError()) {
 return Error("Failed to create uri fetcher: " + uriFetcher.error());
   }
diff --git a/src/slave/containerizer/mesos/provisioner/docker/store.cpp 
b/src/slave/containerizer/mesos/provisioner/docker/store.cpp
index d64e6eb..d277cc6 100644
--- a/src/slave/containerizer/mesos/provisioner/docker/store.cpp
+++ b/src/slave/containerizer/mesos/provisioner/docker/store.cpp
@@ -141,6 +141,7 @@ Try> Store::create(
   // TODO(dpravat): Remove after resolving MESOS-5473.
 #ifndef __WINDOWS__
   _flags.docker_config = flags.docker_config;
+  _flags.docker_stall_timeout = flags.fetcher_stall_timeout;
 #endif
 
   Try> fetcher = uri::fetcher::create(_flags);
diff --git a/src/uri/fetchers/curl.cpp b/src/uri/fetchers/curl.cpp
index f34daf2..2f67a86 100644
--- a/src/uri/fetchers/curl.cpp
+++ b/src/uri/fetchers/curl.cpp
@@ -54,6 +54,16 @@ using process::Subprocess;
 namespace mesos {
 namespace uri {
 
+CurlFetcherPlugin::Flags::Flags()
+{
+  add(&Flags::curl_stall_timeout,
+  "curl_stall_timeout",
+  "Amount of time for the fetcher to wait before considering a download\n"
+  "being too slow and abort it when the download stalls (i.e., the speed\n"
+  "keeps below one byte per second).\n");
+}
+
+
 const char CurlFetcherPlugin::NAME[] = "curl";
 
 
@@ -61,7 +71,7 @@ Try> CurlFetcherPlugin::create(const 
Flags& flags)
 {
   // TODO(jieyu): Make sure curl is available.
 
-  return Owned(new CurlFetcherPlugin());
+  return Owned(new CurlFetcherPlugin(flags));
 }
 
 
@@ -98,7 +108,7 @@ Future CurlFetcherPlugin::fetch(
   // TODO(jieyu): Allow user to specify the name of the output file.
   const string output = path::join(directory, Path(uri.path()).basename());
 
-  const vector argv = {
+  vector argv = {
 "curl",
 "-s", // Don't show progress meter or error messages.
 "-S", // Makes curl show an error message if it fails.
@@ -108,6 +118,15 @@ Future CurlFetcherPlugin::fetch(
 strings::trim(stringify(uri))
   };
 
+  // Add a timeout for curl to abort when the download speed keeps low
+  // (1 byte per second by default) for the specified duration. See:
+  // https://curl.haxx.se/docs/manpage.html#-y
+  if (flags.curl_stall_timeout.isSome()) {
+argv.push_back("-y");
+argv.push_back(
+std::to_string(static_cast(flags.curl_stall_timeout->secs(;
+  }
+
   Try s = subprocess(
   "curl",
   argv,
diff --git a/src/uri/fetchers/curl.hpp b/src/uri/fetchers/curl.hpp
index 909c2eb..e07c1e2 100644
--- a/src/uri/fetchers/curl.hpp
+++ b/src/uri/fetchers/curl.hpp
@@ -30,7 +30,13 @@ namespace uri {
 class CurlFetcherPlugin : public Fetcher::Plugin
 {
 public:
-  class Flags : public virtual flags::FlagsBase {};
+  class Flags : public virtual flags::FlagsBase
+  {
+  public:
+Flags();
+
+Option curl_stall_timeout;
+  };
 
   static const char NAME[];
 
@@ -48,7 +54,9 @@ public:
   const Option& data = None()) const;
 
 private:
-  CurlFetcherPlugin() {}
+  explicit CurlFetcherPlugin(const Flags& _flags) : flags(_flags) {}
+
+  const Flags flags;
 };
 
 } // namespace uri {
diff --git a/src/uri/fetchers/docker.cpp 

[mesos] 01/04: Added the `stall_timeout` parameter to `net::download()`.

2018-10-08 Thread chhsiao
This is an automated email from the ASF dual-hosted git repository.

chhsiao pushed a commit to branch 1.5.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit f43d2cd58fb17cce26fdc050421d1d00b680c253
Author: Chun-Hung Hsiao 
AuthorDate: Wed Mar 28 22:47:46 2018 -0700

Added the `stall_timeout` parameter to `net::download()`.

When the `stall_timeout` is given, the download would be aborted if the
download speed keeps below 1 byte per second during the timeout.

Review: https://reviews.apache.org/r/65855/
---
 3rdparty/stout/include/stout/net.hpp | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/3rdparty/stout/include/stout/net.hpp 
b/3rdparty/stout/include/stout/net.hpp
index abb0144..d2992c0 100644
--- a/3rdparty/stout/include/stout/net.hpp
+++ b/3rdparty/stout/include/stout/net.hpp
@@ -48,6 +48,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -135,8 +136,12 @@ inline Try contentLength(const std::string& url)
 
 // Returns the HTTP response code resulting from attempting to
 // download the specified HTTP or FTP URL into a file at the specified
-// path.
-inline Try download(const std::string& url, const std::string& path)
+// path. The `stall_timeout` parameter controls how long the download
+// waits before aborting when the download speed keeps below 1 byte/sec.
+inline Try download(
+const std::string& url,
+const std::string& path,
+const Option& stall_timeout = None())
 {
   initialize();
 
@@ -176,6 +181,16 @@ inline Try download(const std::string& url, const 
std::string& path)
   }
   curl_easy_setopt(curl, CURLOPT_WRITEDATA, file);
 
+  if (stall_timeout.isSome()) {
+// Set the options to abort the download if the speed keeps below
+// 1 byte/sec during the timeout. See:
+// https://curl.haxx.se/libcurl/c/CURLOPT_LOW_SPEED_LIMIT.html
+// https://curl.haxx.se/libcurl/c/CURLOPT_LOW_SPEED_TIME.html
+curl_easy_setopt(curl, CURLOPT_LOW_SPEED_LIMIT, 1L);
+curl_easy_setopt(
+curl, CURLOPT_LOW_SPEED_TIME, 
static_cast(stall_timeout->secs()));
+  }
+
   CURLcode curlErrorCode = curl_easy_perform(curl);
   if (curlErrorCode != 0) {
 curl_easy_cleanup(curl);



[mesos] 04/04: Added MESOS-8620 to the 1.5.2 CHANGELOG.

2018-10-08 Thread chhsiao
This is an automated email from the ASF dual-hosted git repository.

chhsiao pushed a commit to branch 1.5.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit d2ce21df0b0a9d008689eee74ff53cef4819a79c
Author: Chun-Hung Hsiao 
AuthorDate: Fri Oct 5 15:23:10 2018 -0700

Added MESOS-8620 to the 1.5.2 CHANGELOG.
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index 68b6335..ca56fce 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -7,6 +7,7 @@ Release Notes - Mesos - Version 1.5.2 (WIP)
   * [MESOS-8418] - mesos-agent high cpu usage because of numerous /proc/mounts 
reads.
   * [MESOS-8545] - AgentAPIStreamingTest.AttachInputToNestedContainerSession 
is flaky.
   * [MESOS-8568] - Command checks should always call `WAIT_NESTED_CONTAINER` 
before `REMOVE_NESTED_CONTAINER`
+  * [MESOS-8620] - Containers stuck in FETCHING possibly due to unresponsive 
server.
   * [MESOS-8830] - Agent gc on old slave sandboxes could empty persistent 
volume data
   * [MESOS-8871] - Agent may fail to recover if the agent dies before image 
store cache checkpointed.
   * [MESOS-8904] - Master crash when removing quota.



[mesos] branch 1.5.x updated (ba960ed -> d2ce21d)

2018-10-08 Thread chhsiao
This is an automated email from the ASF dual-hosted git repository.

chhsiao pushed a change to branch 1.5.x
in repository https://gitbox.apache.org/repos/asf/mesos.git.


from ba960ed  Added a log line to `MesosContainerizer::kill()`.
 new f43d2cd  Added the `stall_timeout` parameter to `net::download()`.
 new 0d05ffc  Added `--fetcher_stall_timeout` to abort stalled artifact 
fetching.
 new 97f73a9  Enabled `--fetch_stall_timeout` in curl-based URI fetcher 
plugins.
 new d2ce21d  Added MESOS-8620 to the 1.5.2 CHANGELOG.

The 4 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 3rdparty/stout/include/stout/net.hpp   | 19 ++-
 CHANGELOG  |  1 +
 docs/configuration/agent.md| 12 
 include/mesos/fetcher/fetcher.proto|  3 +
 src/launcher/fetcher.cpp   | 40 +-
 src/slave/constants.hpp|  3 +
 src/slave/containerizer/fetcher.cpp|  3 +
 .../containerizer/mesos/provisioner/appc/store.cpp |  5 +-
 .../mesos/provisioner/docker/store.cpp |  1 +
 src/slave/flags.cpp|  9 +++
 src/slave/flags.hpp|  1 +
 src/uri/fetchers/curl.cpp  | 23 +++-
 src/uri/fetchers/curl.hpp  | 12 +++-
 src/uri/fetchers/docker.cpp| 64 --
 src/uri/fetchers/docker.hpp|  1 +
 15 files changed, 161 insertions(+), 36 deletions(-)



[mesos] 02/04: Added `--fetcher_stall_timeout` to abort stalled artifact fetching.

2018-10-08 Thread chhsiao
This is an automated email from the ASF dual-hosted git repository.

chhsiao pushed a commit to branch 1.5.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 0d05ffc174f90aa8573869ab36bd338224121b42
Author: Chun-Hung Hsiao 
AuthorDate: Wed Mar 28 22:47:52 2018 -0700

Added `--fetcher_stall_timeout` to abort stalled artifact fetching.

This flag specifies a timeout for `mesos-fetcher` to wait before
aborting if the download speed keeps below 1 bytes/sec. This would avoid
containers to get stuck at FETCHING.

Review: https://reviews.apache.org/r/65856/
---
 docs/configuration/agent.md | 12 +++
 include/mesos/fetcher/fetcher.proto |  3 +++
 src/launcher/fetcher.cpp| 40 +
 src/slave/constants.hpp |  3 +++
 src/slave/containerizer/fetcher.cpp |  3 +++
 src/slave/flags.cpp |  9 +
 src/slave/flags.hpp |  1 +
 7 files changed, 58 insertions(+), 13 deletions(-)

diff --git a/docs/configuration/agent.md b/docs/configuration/agent.md
index 3cccf89..2a72a64 100644
--- a/docs/configuration/agent.md
+++ b/docs/configuration/agent.md
@@ -804,6 +804,18 @@ Size of the fetcher cache in Bytes. (default: 2GB)
 
 
   
+--fetcher_stall_timeout=VALUE
+  
+  
+Amount of time for the fetcher to wait before considering a download
+being too slow and abort it when the download stalls (i.e., the speed
+keeps below one byte per second).
+NOTE: This feature only applies when downloading data from the net and
+does not apply to HDFS. (default: 1mins)
+  
+
+
+  
 --frameworks_home=VALUE
   
   
diff --git a/include/mesos/fetcher/fetcher.proto 
b/include/mesos/fetcher/fetcher.proto
index 6a5d807..d668106 100644
--- a/include/mesos/fetcher/fetcher.proto
+++ b/include/mesos/fetcher/fetcher.proto
@@ -64,4 +64,7 @@ message FetcherInfo {
   repeated Item items = 3;
   optional string user = 4;
   optional string frameworks_home = 5;
+
+  // Only applies when fetching artifacts from the net.
+  optional DurationInfo stall_timeout = 6;
 }
diff --git a/src/launcher/fetcher.cpp b/src/launcher/fetcher.cpp
index e2372a1..06fa52c 100644
--- a/src/launcher/fetcher.cpp
+++ b/src/launcher/fetcher.cpp
@@ -165,7 +165,8 @@ static Try downloadWithHadoopClient(
 
 static Try downloadWithNet(
 const string& sourceUri,
-const string& destinationPath)
+const string& destinationPath,
+const Option& stallTimeout)
 {
   // The net::download function only supports these protocols.
   CHECK(strings::startsWith(sourceUri, "http://";)  ||
@@ -176,7 +177,7 @@ static Try downloadWithNet(
   LOG(INFO) << "Downloading resource from '" << sourceUri
 << "' to '" << destinationPath << "'";
 
-  Try code = net::download(sourceUri, destinationPath);
+  Try code = net::download(sourceUri, destinationPath, stallTimeout);
   if (code.isError()) {
 return Error("Error downloading resource: " + code.error());
   } else {
@@ -217,7 +218,8 @@ static Try copyFile(
 static Try download(
 const string& _sourceUri,
 const string& destinationPath,
-const Option& frameworksHome)
+const Option& frameworksHome,
+const Option& stallTimeout)
 {
   // Trim leading whitespace for 'sourceUri'.
   const string sourceUri = strings::trim(_sourceUri, strings::PREFIX);
@@ -243,7 +245,7 @@ static Try download(
   // 2. Try to fetch URI using os::net / libcurl implementation.
   // We consider http, https, ftp, ftps compatible with libcurl.
   if (Fetcher::isNetUri(sourceUri)) {
-return downloadWithNet(sourceUri, destinationPath);
+return downloadWithNet(sourceUri, destinationPath, stallTimeout);
   }
 
   // 3. Try to fetch the URI using hadoop client.
@@ -286,7 +288,8 @@ static Try chmodExecutable(const string& filePath)
 static Try fetchBypassingCache(
 const CommandInfo::URI& uri,
 const string& sandboxDirectory,
-const Option& frameworksHome)
+const Option& frameworksHome,
+const Option& stallTimeout)
 {
   LOG(INFO) << "Fetching directly into the sandbox directory";
 
@@ -315,7 +318,8 @@ static Try fetchBypassingCache(
 
   string path = path::join(sandboxDirectory, outputFile.get());
 
-  Try downloaded = download(uri.value(), path, frameworksHome);
+  Try downloaded =
+download(uri.value(), path, frameworksHome, stallTimeout);
   if (downloaded.isError()) {
 return Error(downloaded.error());
   }
@@ -404,7 +408,8 @@ static Try fetchThroughCache(
 const FetcherInfo::Item& item,
 const Option& cacheDirectory,
 const string& sandboxDirectory,
-const Option& frameworksHome)
+const Option& frameworksHome,
+const Option& stallTimeout)
 {
   if (cacheDirectory.isNone() || cacheDirectory.get().empty()) {
 return Error("Cache directory not specified");
@@ -428,7 +433,8 @@ static Try fetchThroughCache(
 Try downloaded = download(
 item.uri().value(),
 path::join(cacheDirectory.get(), item.cache_filename

[mesos] branch master updated: Added MESOS-8620 to the 1.5.2 CHANGELOG.

2018-10-08 Thread chhsiao
This is an automated email from the ASF dual-hosted git repository.

chhsiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/master by this push:
 new 8375e42  Added MESOS-8620 to the 1.5.2 CHANGELOG.
8375e42 is described below

commit 8375e426d1c07f625ee18cd439e0dd4f1dc804c5
Author: Chun-Hung Hsiao 
AuthorDate: Fri Oct 5 15:23:10 2018 -0700

Added MESOS-8620 to the 1.5.2 CHANGELOG.
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index 8756474..bb75df6 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -793,6 +793,7 @@ Release Notes - Mesos - Version 1.5.2 (WIP)
   * [MESOS-8418] - mesos-agent high cpu usage because of numerous /proc/mounts 
reads.
   * [MESOS-8545] - AgentAPIStreamingTest.AttachInputToNestedContainerSession 
is flaky.
   * [MESOS-8568] - Command checks should always call `WAIT_NESTED_CONTAINER` 
before `REMOVE_NESTED_CONTAINER`
+  * [MESOS-8620] - Containers stuck in FETCHING possibly due to unresponsive 
server.
   * [MESOS-8830] - Agent gc on old slave sandboxes could empty persistent 
volume data
   * [MESOS-8871] - Agent may fail to recover if the agent dies before image 
store cache checkpointed.
   * [MESOS-8904] - Master crash when removing quota.



[mesos] branch master updated: Updated verify-reviews.py to use current interpreter in subprocesses.

2018-10-08 Thread vinodkone
This is an automated email from the ASF dual-hosted git repository.

vinodkone pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/master by this push:
 new 568fcdf  Updated verify-reviews.py to use current interpreter in 
subprocesses.
568fcdf is described below

commit 568fcdfd29788d9df89a51ffae7969c2bf0ea173
Author: Armand Grillet 
AuthorDate: Mon Oct 8 15:01:28 2018 -0500

Updated verify-reviews.py to use current interpreter in subprocesses.

This changes the command used in `support/verify-reviews.py` when
running `support/apply-reviews.py` as a subprocess. It was previously
`"python"`, which is generally Python 2, and is now `sys.executable`.

That way, if verify-reviews.py is run with Python 3 (as it should),
apply-reviews.py will be run with the same Python 3 interpreter. This
should fix the `ImportError` issues we have recently seen in our CI.

Review: https://reviews.apache.org/r/68951/
---
 support/verify-reviews.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/support/verify-reviews.py b/support/verify-reviews.py
index 56321ae..552dc36 100755
--- a/support/verify-reviews.py
+++ b/support/verify-reviews.py
@@ -94,7 +94,7 @@ def shell(command):
 def apply_review(review_id):
 """Apply a review using the script apply-reviews.py."""
 print("Applying review %s" % review_id)
-shell("python support/apply-reviews.py -n -r %s" % review_id)
+shell("%s support/apply-reviews.py -n -r %s" % (sys.executable, review_id))
 
 
 def apply_reviews(review_request, reviews, handler):



[mesos] branch master updated: Bring up the loopback interface using `iproute2`.

2018-10-08 Thread gilbert
This is an automated email from the ASF dual-hosted git repository.

gilbert pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/master by this push:
 new d8de112  Bring up the loopback interface using `iproute2`.
d8de112 is described below

commit d8de1127bb4aa5cb36d2942cdea39e8ec620babe
Author: Sergey Urbanovich 
AuthorDate: Mon Oct 8 16:42:41 2018 -0700

Bring up the loopback interface using `iproute2`.

The last release of `net-tools` was released in 2001. The tools were
deprecated years ago (see [1], [2], and [3]) and no longer installed
by default in many linux-based operating systems.

Bring up the loopback interface using `ip` from `iproute2` and if it
fails to start fall back on `ifconfig` from `net-tools`.

[1] https://lists.debian.org/debian-devel/2009/03/msg00780.html
[2] https://bugzilla.redhat.com/show_bug.cgi?id=687920
[3] https://lwn.net/Articles/710533/

Review: https://reviews.apache.org/r/68921/
---
 .../containerizer/mesos/isolators/network/cni/cni.cpp  | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/src/slave/containerizer/mesos/isolators/network/cni/cni.cpp 
b/src/slave/containerizer/mesos/isolators/network/cni/cni.cpp
index ba46552..64271df 100644
--- a/src/slave/containerizer/mesos/isolators/network/cni/cni.cpp
+++ b/src/slave/containerizer/mesos/isolators/network/cni/cni.cpp
@@ -2252,21 +2252,31 @@ int NetworkCniIsolatorSetup::execute()
   return EXIT_FAILURE;
 }
 
-const Option status = os::spawn("ifconfig", {"ifconfig", "lo", "up"});
+// TODO(urbanserj): To get rid of all external dependencies such as
+// `iproute2` and `net-tools`, use Netlink Protocol Library (libnl).
+Option status = os::spawn(
+"ip", {"ip", "link", "set", "dev", "lo", "up"});
 
 const string message =
   "Failed to bring up the loopback interface in the new "
   "network namespace of pid " + stringify(flags.pid.get());
 
 if (status.isNone()) {
-  cerr << message << ": " << "os::spawn failed: "
+  cerr << message << ": os::spawn 'ip link set dev lo up' failed: "
+   << os::strerror(errno) << endl;
+
+  // Fall back on `ifconfig` if `ip` command fails to start.
+  status = os::spawn("ifconfig", {"ifconfig", "lo", "up"});
+}
+
+if (status.isNone()) {
+  cerr << message << ": os::spawn 'ifconfig lo up' failed: "
<< os::strerror(errno) << endl;
   return EXIT_FAILURE;
 }
 
 if (!WSUCCEEDED(status.get())) {
-  cerr << message << ": 'ifconfig lo up' "
-   << WSTRINGIFY(status.get()) << endl;
+  cerr << message << ": " << WSTRINGIFY(status.get()) << endl;
   return EXIT_FAILURE;
 }
   }



[mesos] branch master updated: Added a 1.7.0 performance improvements blog post.

2018-10-08 Thread bmahler
This is an automated email from the ASF dual-hosted git repository.

bmahler pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/mesos.git


The following commit(s) were added to refs/heads/master by this push:
 new d0a4a08  Added a 1.7.0 performance improvements blog post.
d0a4a08 is described below

commit d0a4a08a3510722a17233d06661acbb063493db9
Author: Benjamin Mahler 
AuthorDate: Fri Oct 5 15:56:08 2018 -0700

Added a 1.7.0 performance improvements blog post.

Review: https://reviews.apache.org/r/68940
---
 ...7-performance-improvements-allocation-cycle.png | Bin 0 -> 100114 bytes
 ...7-performance-improvements-container-launch.png | Bin 0 -> 80005 bytes
 ...ce-improvements-containers-endpoint-latency.png | Bin 0 -> 93369 bytes
 ...ance-improvements-containers-endpoint-tasks.png | Bin 0 -> 75220 bytes
 ...1.7-performance-improvements-parallel-state.png | Bin 0 -> 404766 bytes
 .../1.7-performance-improvements-rapidjson.png | Bin 0 -> 92573 bytes
 ...8-10-08-mesos-1-7-0-performance-improvements.md | 115 +
 7 files changed, 115 insertions(+)

diff --git 
a/site/source/assets/img/blog/1.7-performance-improvements-allocation-cycle.png 
b/site/source/assets/img/blog/1.7-performance-improvements-allocation-cycle.png
new file mode 100644
index 000..1a0eaea
Binary files /dev/null and 
b/site/source/assets/img/blog/1.7-performance-improvements-allocation-cycle.png 
differ
diff --git 
a/site/source/assets/img/blog/1.7-performance-improvements-container-launch.png 
b/site/source/assets/img/blog/1.7-performance-improvements-container-launch.png
new file mode 100644
index 000..60ebc3f
Binary files /dev/null and 
b/site/source/assets/img/blog/1.7-performance-improvements-container-launch.png 
differ
diff --git 
a/site/source/assets/img/blog/1.7-performance-improvements-containers-endpoint-latency.png
 
b/site/source/assets/img/blog/1.7-performance-improvements-containers-endpoint-latency.png
new file mode 100644
index 000..20af929
Binary files /dev/null and 
b/site/source/assets/img/blog/1.7-performance-improvements-containers-endpoint-latency.png
 differ
diff --git 
a/site/source/assets/img/blog/1.7-performance-improvements-containers-endpoint-tasks.png
 
b/site/source/assets/img/blog/1.7-performance-improvements-containers-endpoint-tasks.png
new file mode 100644
index 000..de94b7a
Binary files /dev/null and 
b/site/source/assets/img/blog/1.7-performance-improvements-containers-endpoint-tasks.png
 differ
diff --git 
a/site/source/assets/img/blog/1.7-performance-improvements-parallel-state.png 
b/site/source/assets/img/blog/1.7-performance-improvements-parallel-state.png
new file mode 100644
index 000..1a81a7c
Binary files /dev/null and 
b/site/source/assets/img/blog/1.7-performance-improvements-parallel-state.png 
differ
diff --git 
a/site/source/assets/img/blog/1.7-performance-improvements-rapidjson.png 
b/site/source/assets/img/blog/1.7-performance-improvements-rapidjson.png
new file mode 100644
index 000..b453c37
Binary files /dev/null and 
b/site/source/assets/img/blog/1.7-performance-improvements-rapidjson.png differ
diff --git 
a/site/source/blog/2018-10-08-mesos-1-7-0-performance-improvements.md 
b/site/source/blog/2018-10-08-mesos-1-7-0-performance-improvements.md
new file mode 100644
index 000..6780322
--- /dev/null
+++ b/site/source/blog/2018-10-08-mesos-1-7-0-performance-improvements.md
@@ -0,0 +1,115 @@
+---
+layout: post
+title: Performance Improvements in Mesos 1.7.0
+published: true
+post_author:
+  display_name: Benjamin Mahler
+  gravatar: fb43656d4d45f940160c3226c53309f5
+  twitter: bmahler
+tags: Performance
+---
+
+**Scalability and performance are key features for Mesos. Some users of Mesos 
already run production clusters that consist of many tens of thousands of 
nodes.** However, there remains a lot of room for improvement across a variety 
of areas of the system.
+
+The Mesos community has been working hard over the past few months to address 
several performance issues that have been affecting users. The following are 
some of the key performance improvements included in Mesos 1.7.0:
+
+* **Master `/state` endpoint:** Adopted [RapidJSON](http://rapidjson.org/) and 
reduced copying for a 2.3x throughput improvement due to a ~55% decrease in 
latency ([MESOS-9092](https://issues.apache.org/jira/browse/MESOS-9092)). Also, 
added parallel processing of `/state` requests to reduce master backlogging / 
interference under high request load 
([MESOS-9122](https://issues.apache.org/jira/browse/MESOS-9122)).
+* **Allocator:** in 1.7.1 (these patches did not make 1.7.0 and were 
backported to 1.7.x), allocation cycle time was reduced. Some benchmarks show 
an 80% reduction. This, together with the reduced master backlogging from 
`/state` improvements, substantially reduces the end-to-end offer cycling time 
between Mesos and schedulers.
+* **Agent `/containers` endpoint:** Fixed a performance issue t