Fixed the default filter used by the allocator.

If a framework accepts/refuses an offer using a very long filter, the
`HierarchicalAllocator` will use the default filter instead, meaning
that it will filter the resources for only 5 seconds. This can happen
when a framework sets `Filter::refuse_seconds` to a number of seconds
larger than what fits in `Duration`.

This patch makes the hierarchical allocator cap the filter duration to
at most 365 days.

Review: https://reviews.apache.org/r/60525/


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/183cceef
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/183cceef
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/183cceef

Branch: refs/heads/master
Commit: 183cceef366586f4a55b6ba7144c4a8277eb9962
Parents: 8254fd3
Author: Gastón Kleiman <gas...@mesosphere.io>
Authored: Mon Aug 14 13:52:52 2017 -0700
Committer: Greg Mann <gregorywm...@gmail.com>
Committed: Mon Aug 14 13:56:34 2017 -0700

----------------------------------------------------------------------
 docs/scheduler-http-api.md                  |  6 ++-
 include/mesos/mesos.proto                   |  5 ++
 include/mesos/v1/mesos.proto                |  5 ++
 src/master/allocator/mesos/hierarchical.cpp | 68 +++++++++++++++---------
 4 files changed, 58 insertions(+), 26 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/183cceef/docs/scheduler-http-api.md
----------------------------------------------------------------------
diff --git a/docs/scheduler-http-api.md b/docs/scheduler-http-api.md
index 4a5d77b..bd0e8c2 100644
--- a/docs/scheduler-http-api.md
+++ b/docs/scheduler-http-api.md
@@ -165,7 +165,11 @@ HTTP/1.1 202 Accepted
 ```
 
 ### ACCEPT
-Sent by the scheduler when it accepts offer(s) sent by the master. The 
`ACCEPT` request includes the type of operations (e.g., launch task, launch 
task group, reserve resources, create volumes) that the scheduler wants to 
perform on the offers. Note that until the scheduler replies (accepts or 
declines) to an offer, the offer's resources are considered allocated to the 
offer's role and to the framework. Also, any of the offer's resources not used 
in the `ACCEPT` call (e.g., to launch a task or task group) are considered 
declined and might be reoffered to other frameworks. In other words, the same 
`OfferID` cannot be used in more than one `ACCEPT` call. These semantics might 
change when we add new features to Mesos (e.g., persistence, reservations, 
optimistic offers, resizeTask, etc.).
+Sent by the scheduler when it accepts offer(s) sent by the master. The 
`ACCEPT` request includes the type of operations (e.g., launch task, launch 
task group, reserve resources, create volumes) that the scheduler wants to 
perform on the offers. Note that until the scheduler replies (accepts or 
declines) to an offer, the offer's resources are considered allocated to the 
offer's role and to the framework. Also, any of the offer's resources not used 
in the `ACCEPT` call (e.g., to launch a task or task group) are considered 
declined and might be reoffered to other frameworks, meaning that they will not 
be reoffered to the scheduler for the amount of time defined by the filter. The 
same `OfferID` cannot be used in more than one `ACCEPT` call. These semantics 
might change when we add new features to Mesos (e.g., persistence, 
reservations, optimistic offers, resizeTask, etc.).
+
+The scheduler API uses `Filters.refuse_seconds` to specify the duration for 
which resources are considered declined. If `filters` is not set, then the 
default value defined in 
[mesos.proto](https://github.com/apache/mesos/blob/master/include/mesos/v1/mesos.proto)
 will be used.
+
+NOTE: Mesos will cap `Filters.refuse_seconds` at 31536000 seconds (365 days).
 
 ```
 ACCEPT Request (JSON):

http://git-wip-us.apache.org/repos/asf/mesos/blob/183cceef/include/mesos/mesos.proto
----------------------------------------------------------------------
diff --git a/include/mesos/mesos.proto b/include/mesos/mesos.proto
index 50a5caf..e39926e 100644
--- a/include/mesos/mesos.proto
+++ b/include/mesos/mesos.proto
@@ -2254,6 +2254,11 @@ message Filters {
   // SchedulerDriver::launchTasks. You MUST pass Filters with this
   // field set to change this behavior (i.e., get another offer which
   // includes unused resources sooner or later than the default).
+  //
+  // If this field is set to a number of seconds greater than 31536000
+  // (365 days), then the resources will be considered refused for 365
+  // days. If it is set to a negative number, then the default value
+  // will be used.
   optional double refuse_seconds = 1 [default = 5.0];
 }
 

http://git-wip-us.apache.org/repos/asf/mesos/blob/183cceef/include/mesos/v1/mesos.proto
----------------------------------------------------------------------
diff --git a/include/mesos/v1/mesos.proto b/include/mesos/v1/mesos.proto
index 9d47a8a..2979f9e 100644
--- a/include/mesos/v1/mesos.proto
+++ b/include/mesos/v1/mesos.proto
@@ -2237,6 +2237,11 @@ message Filters {
   // SchedulerDriver::launchTasks. You MUST pass Filters with this
   // field set to change this behavior (i.e., get another offer which
   // includes unused resources sooner or later than the default).
+  //
+  // If this field is set to a number of seconds greater than 31536000
+  // (365 days), then the resources will be considered refused for 365
+  // days. If it is set to a negative number, then the default value
+  // will be used.
   optional double refuse_seconds = 1 [default = 5.0];
 }
 

http://git-wip-us.apache.org/repos/asf/mesos/blob/183cceef/src/master/allocator/mesos/hierarchical.cpp
----------------------------------------------------------------------
diff --git a/src/master/allocator/mesos/hierarchical.cpp 
b/src/master/allocator/mesos/hierarchical.cpp
index e5372ba..2213130 100644
--- a/src/master/allocator/mesos/hierarchical.cpp
+++ b/src/master/allocator/mesos/hierarchical.cpp
@@ -1015,33 +1015,42 @@ void HierarchicalAllocatorProcess::updateInverseOffer(
     return;
   }
 
-  // Create a refused resource filter.
-  Try<Duration> seconds = Duration::create(filters.get().refuse_seconds());
+  // Create a refused inverse offer filter.
+  Try<Duration> timeout = Duration::create(Filters().refuse_seconds());
 
-  if (seconds.isError()) {
-    LOG(WARNING) << "Using the default value of 'refuse_seconds' to create"
-                 << " the refused inverse offer filter because the input value"
-                 << " is invalid: " << seconds.error();
+  if (filters->refuse_seconds() > Days(365).secs()) {
+    LOG(WARNING) << "Using 365 days to create the refused inverse offer"
+                 << " filter because the input value is too big";
 
-    seconds = Duration::create(Filters().refuse_seconds());
-  } else if (seconds.get() < Duration::zero()) {
+    timeout = Days(365);
+  } else if (filters->refuse_seconds() < 0) {
     LOG(WARNING) << "Using the default value of 'refuse_seconds' to create"
-                 << " the refused inverse offer filter because the input value"
-                 << " is negative";
+                 << " the refused inverse offer filter because the input"
+                 << " value is negative";
+
+    timeout = Duration::create(Filters().refuse_seconds());
+  } else {
+    timeout = Duration::create(filters->refuse_seconds());
 
-    seconds = Duration::create(Filters().refuse_seconds());
+    if (timeout.isError()) {
+      LOG(WARNING) << "Using the default value of 'refuse_seconds' to create"
+                   << " the refused inverse offer filter because the input"
+                   << " value is invalid: " + timeout.error();
+
+      timeout = Duration::create(Filters().refuse_seconds());
+    }
   }
 
-  CHECK_SOME(seconds);
+  CHECK_SOME(timeout);
 
-  if (seconds.get() != Duration::zero()) {
+  if (timeout.get() != Duration::zero()) {
     VLOG(1) << "Framework " << frameworkId
             << " filtered inverse offers from agent " << slaveId
-            << " for " << seconds.get();
+            << " for " << timeout.get();
 
     // Create a new inverse offer filter and delay its expiration.
     InverseOfferFilter* inverseOfferFilter =
-      new RefusedInverseOfferFilter(Timeout::in(seconds.get()));
+      new RefusedInverseOfferFilter(Timeout::in(timeout.get()));
 
     framework.inverseOfferFilters[slaveId].insert(inverseOfferFilter);
 
@@ -1053,7 +1062,7 @@ void HierarchicalAllocatorProcess::updateInverseOffer(
              InverseOfferFilter*) = &Self::expire;
 
     delay(
-        seconds.get(),
+        timeout.get(),
         self(),
         expireInverseOffer,
         frameworkId,
@@ -1167,20 +1176,29 @@ void HierarchicalAllocatorProcess::recoverResources(
   }
 
   // Create a refused resources filter.
-  Try<Duration> timeout = Duration::create(filters.get().refuse_seconds());
+  Try<Duration> timeout = Duration::create(Filters().refuse_seconds());
 
-  if (timeout.isError()) {
-    LOG(WARNING) << "Using the default value of 'refuse_seconds' to create"
-                 << " the refused resources filter because the input value"
-                 << " is invalid: " << timeout.error();
+  if (filters->refuse_seconds() > Days(365).secs()) {
+    LOG(WARNING) << "Using 365 days to create the refused resources offer"
+                 << " filter because the input value is too big";
 
-    timeout = Duration::create(Filters().refuse_seconds());
-  } else if (timeout.get() < Duration::zero()) {
+    timeout = Days(365);
+  } else if (filters->refuse_seconds() < 0) {
     LOG(WARNING) << "Using the default value of 'refuse_seconds' to create"
-                 << " the refused resources filter because the input value"
-                 << " is negative";
+                 << " the refused resources offer filter because the input"
+                 << " value is negative";
 
     timeout = Duration::create(Filters().refuse_seconds());
+  } else {
+    timeout = Duration::create(filters->refuse_seconds());
+
+    if (timeout.isError()) {
+      LOG(WARNING) << "Using the default value of 'refuse_seconds' to create"
+                   << " the refused resources offer filter because the input"
+                   << " value is invalid: " + timeout.error();
+
+      timeout = Duration::create(Filters().refuse_seconds());
+    }
   }
 
   CHECK_SOME(timeout);

Reply via email to