Hi,

The host selection algorithm in koji/daemon.py (TaskManager.checkRelAvail) is 
designed to choose a host that's in the top half of hosts by most available 
capacity. I think it doesn't quite work as might be expected though -- in the 
case where you have five hosts all with a capacity of 3.0, you could have the 
following behaviour:

  A takes job, capacities: [3.0, 3.0, 3.0, 3.0, 2.0] (median = 3.0)
  B takes job, capacities: [3.0, 3.0, 3.0, 2.0, 2.0] (median = 3.0)
  C takes job, capacities: [3.0, 3.0, 2.0, 2.0, 2.0] (median = 2.0)
  A takes job, capacities: [3.0, 3.0, 2.0, 2.0, 1.0] (median = 2.0)
  B takes job, capacities: [3.0, 3.0, 2.0, 1.0, 1.0] (median = 2.0)
  C takes job, capacities: [3.0, 3.0, 1.0, 1.0, 1.0] (median = 1.0)
  A takes job, capacities: [3.0, 3.0, 1.0, 1.0] (median = 3.0)
  D takes job, capacities: [3.0, 2.0, 1.0, 1.0] (median = 2.0)
  D takes job, capacities: [3.0, 1.0, 1.0, 1.0] (median = 1.0)
  B takes job, capacities: [3.0, 1.0, 1.0] (median = 1.0)
  C takes job, capacities: [3.0, 1.0] (median = 3.0)

So eleven jobs get divided up as:

  3 on A
  3 on B
  3 on C
  2 on D
  0 on E

This is because the >=median test doesn't actually ensure this host is in the 
top half, just that there's a host in the top half that's no better than this 
one. I think a better test would be ==best or >median, in which case you'd get:

  A takes job, capacities: [3.0, 3.0, 3.0, 3.0, 2.0]
  B takes job, capacities: [3.0, 3.0, 3.0, 2.0, 2.0]
  C takes job, capacities: [3.0, 3.0, 2.0, 2.0, 2.0]
  D takes job, capacities: [3.0, 2.0, 2.0, 2.0, 2.0]
  E takes job, capacities: [2.0, 2.0, 2.0, 2.0, 2.0]

Patch attached for consideration.

Cheers,
aj

-- 
Anthony Towns <[email protected]>
diff --git a/koji/daemon.py b/koji/daemon.py
index b6f775a..a5bf268 100644
--- a/koji/daemon.py
+++ b/koji/daemon.py
@@ -763,8 +763,6 @@ class TaskManager(object):
                 #accept this task)
                 bin_avail = avail.get(bin, [0])
                 self.logger.debug("available capacities for bin: %r" % bin_avail)
-                median = bin_avail[(len(bin_avail)-1)/2]
-                self.logger.debug("ours: %.2f, median: %.2f" % (our_avail, median))
                 if not self.checkRelAvail(bin_avail, our_avail):
                     #decline for now and give the upper half a chance
                     return False
@@ -781,9 +779,10 @@ class TaskManager(object):
         Check our available capacity against the capacity of other hosts in this bin.
         Return True if we should take a task, False otherwise.
         """
-        median = bin_avail[(len(bin_avail)-1)/2]
-        self.logger.debug("ours: %.2f, median: %.2f" % (avail, median))
-        if avail >= median:
+        best = bin_avail[0]
+        median = bin_avail[len(bin_avail)/2]
+        self.logger.debug("ours: %.2f, best: %.2f, median: %.2f" % (avail, best, median))
+        if avail >= best or avail > median:
             return True
         else:
             self.logger.debug("Skipping - available capacity in lower half")
--
buildsys mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/buildsys

Reply via email to