On Thu, Jul 30, 2015 at 03:42:24PM +0200, 'Klaus Aehlig' via ganeti-devel wrote:
When computing tiered allocation statistics, the normal
step is to shrink the resource showing the most errors.
However, for some abstract resources, like N+1 redundancy,
there is no physical resource this concept refers to;
nevertheless, there is an underlying physical resource that
most likely causes this kind of failure. For N+1 redundancy,
the missing resource almost always is memory. So shrink
based on this assumption.
Signed-off-by: Klaus Aehlig <[email protected]>
---
src/Ganeti/HTools/Cluster.hs | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/src/Ganeti/HTools/Cluster.hs b/src/Ganeti/HTools/Cluster.hs
index 891e3f9..a8313bf 100644
--- a/src/Ganeti/HTools/Cluster.hs
+++ b/src/Ganeti/HTools/Cluster.hs
@@ -861,6 +861,13 @@ sufficesShrinking allocFn inst fm =
of x:_ -> Just . snd $ x
_ -> Nothing
+-- | For a failure determine the underlying resource that most likely
+-- causes this kind of failure. In particular, N+1 violations are most
+-- likely caused by lack of memory.
+underlyingCause :: FailMode -> FailMode
+underlyingCause FailN1 = FailMem
+underlyingCause x = x
+
-- | Tiered allocation method.
--
-- This places instances on the cluster, and decreases the spec until
@@ -877,7 +884,8 @@ tieredAlloc opts nl il limit newinst allocnodes ixes cstats
=
Nothing -> (False, Nothing)
Just n -> (n <= ixes_cnt,
Just (n - ixes_cnt))
- sortedErrs = map fst $ sortBy (flip $ comparing snd) errs
+ sortedErrs = nub . map (underlyingCause . fst)
+ $ sortBy (flip $ comparing snd) errs
suffShrink = sufficesShrinking
(fromMaybe emptyAllocSolution
. flip (tryAlloc opts nl' il') allocnodes)
--
2.5.0.rc2.392.g76e840b
LGTM