Hi,
The host selection algorithm in koji/daemon.py (TaskManager.checkRelAvail) is
designed to choose a host that's in the top half of hosts by most available
capacity. I think it doesn't quite work as might be expected though -- in the
case where you have five hosts all with a capacity of 3.0, you could have the
following behaviour:
A takes job, capacities: [3.0, 3.0, 3.0, 3.0, 2.0] (median = 3.0)
B takes job, capacities: [3.0, 3.0, 3.0, 2.0, 2.0] (median = 3.0)
C takes job, capacities: [3.0, 3.0, 2.0, 2.0, 2.0] (median = 2.0)
A takes job, capacities: [3.0, 3.0, 2.0, 2.0, 1.0] (median = 2.0)
B takes job, capacities: [3.0, 3.0, 2.0, 1.0, 1.0] (median = 2.0)
C takes job, capacities: [3.0, 3.0, 1.0, 1.0, 1.0] (median = 1.0)
A takes job, capacities: [3.0, 3.0, 1.0, 1.0] (median = 3.0)
D takes job, capacities: [3.0, 2.0, 1.0, 1.0] (median = 2.0)
D takes job, capacities: [3.0, 1.0, 1.0, 1.0] (median = 1.0)
B takes job, capacities: [3.0, 1.0, 1.0] (median = 1.0)
C takes job, capacities: [3.0, 1.0] (median = 3.0)
So eleven jobs get divided up as:
3 on A
3 on B
3 on C
2 on D
0 on E
This is because the >=median test doesn't actually ensure this host is in the
top half, just that there's a host in the top half that's no better than this
one. I think a better test would be ==best or >median, in which case you'd get:
A takes job, capacities: [3.0, 3.0, 3.0, 3.0, 2.0]
B takes job, capacities: [3.0, 3.0, 3.0, 2.0, 2.0]
C takes job, capacities: [3.0, 3.0, 2.0, 2.0, 2.0]
D takes job, capacities: [3.0, 2.0, 2.0, 2.0, 2.0]
E takes job, capacities: [2.0, 2.0, 2.0, 2.0, 2.0]
Patch attached for consideration.
Cheers,
aj
--
Anthony Towns <[email protected]>
diff --git a/koji/daemon.py b/koji/daemon.py
index b6f775a..a5bf268 100644
--- a/koji/daemon.py
+++ b/koji/daemon.py
@@ -763,8 +763,6 @@ class TaskManager(object):
#accept this task)
bin_avail = avail.get(bin, [0])
self.logger.debug("available capacities for bin: %r" % bin_avail)
- median = bin_avail[(len(bin_avail)-1)/2]
- self.logger.debug("ours: %.2f, median: %.2f" % (our_avail, median))
if not self.checkRelAvail(bin_avail, our_avail):
#decline for now and give the upper half a chance
return False
@@ -781,9 +779,10 @@ class TaskManager(object):
Check our available capacity against the capacity of other hosts in this bin.
Return True if we should take a task, False otherwise.
"""
- median = bin_avail[(len(bin_avail)-1)/2]
- self.logger.debug("ours: %.2f, median: %.2f" % (avail, median))
- if avail >= median:
+ best = bin_avail[0]
+ median = bin_avail[len(bin_avail)/2]
+ self.logger.debug("ours: %.2f, best: %.2f, median: %.2f" % (avail, best, median))
+ if avail >= best or avail > median:
return True
else:
self.logger.debug("Skipping - available capacity in lower half")
--
buildsys mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/buildsys