When computing tiered allocation statistics, the normal
step is to shrink the resource showing the most errors.
However, for some abstract resources, like N+1 redundancy,
there is no physical resource this concept refers to;
nevertheless, there is an underlying physical resource that
most likely causes this kind of failure. For N+1 redundancy,
the missing resource almost always is memory. So shrink
based on this assumption.

Signed-off-by: Klaus Aehlig <[email protected]>
---
 src/Ganeti/HTools/Cluster.hs | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/Ganeti/HTools/Cluster.hs b/src/Ganeti/HTools/Cluster.hs
index 891e3f9..a8313bf 100644
--- a/src/Ganeti/HTools/Cluster.hs
+++ b/src/Ganeti/HTools/Cluster.hs
@@ -861,6 +861,13 @@ sufficesShrinking allocFn inst fm =
   of x:_ -> Just . snd $ x
      _ -> Nothing
 
+-- | For a failure determine the underlying resource that most likely
+-- causes this kind of failure. In particular, N+1 violations are most
+-- likely caused by lack of memory.
+underlyingCause :: FailMode -> FailMode
+underlyingCause FailN1 = FailMem
+underlyingCause x = x
+
 -- | Tiered allocation method.
 --
 -- This places instances on the cluster, and decreases the spec until
@@ -877,7 +884,8 @@ tieredAlloc opts nl il limit newinst allocnodes ixes cstats 
=
                                Nothing -> (False, Nothing)
                                Just n -> (n <= ixes_cnt,
                                             Just (n - ixes_cnt))
-          sortedErrs = map fst $ sortBy (flip $ comparing snd) errs
+          sortedErrs = nub . map (underlyingCause . fst)
+                        $ sortBy (flip $ comparing snd) errs
           suffShrink = sufficesShrinking
                          (fromMaybe emptyAllocSolution
                           . flip (tryAlloc opts nl' il') allocnodes)
-- 
2.5.0.rc2.392.g76e840b

Reply via email to