On Thu, Jul 30, 2015 at 03:42:24PM +0200, 'Klaus Aehlig' via ganeti-devel wrote:
When computing tiered allocation statistics, the normal
step is to shrink the resource showing the most errors.
However, for some abstract resources, like N+1 redundancy,
there is no physical resource this concept refers to;
nevertheless, there is an underlying physical resource that
most likely causes this kind of failure. For N+1 redundancy,
the missing resource almost always is memory. So shrink
based on this assumption.

Signed-off-by: Klaus Aehlig <[email protected]>
---
src/Ganeti/HTools/Cluster.hs | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/Ganeti/HTools/Cluster.hs b/src/Ganeti/HTools/Cluster.hs
index 891e3f9..a8313bf 100644
--- a/src/Ganeti/HTools/Cluster.hs
+++ b/src/Ganeti/HTools/Cluster.hs
@@ -861,6 +861,13 @@ sufficesShrinking allocFn inst fm =
  of x:_ -> Just . snd $ x
     _ -> Nothing

+-- | For a failure determine the underlying resource that most likely
+-- causes this kind of failure. In particular, N+1 violations are most
+-- likely caused by lack of memory.
+underlyingCause :: FailMode -> FailMode
+underlyingCause FailN1 = FailMem
+underlyingCause x = x
+
-- | Tiered allocation method.
--
-- This places instances on the cluster, and decreases the spec until
@@ -877,7 +884,8 @@ tieredAlloc opts nl il limit newinst allocnodes ixes cstats 
=
                               Nothing -> (False, Nothing)
                               Just n -> (n <= ixes_cnt,
                                            Just (n - ixes_cnt))
-          sortedErrs = map fst $ sortBy (flip $ comparing snd) errs
+          sortedErrs = nub . map (underlyingCause . fst)
+                        $ sortBy (flip $ comparing snd) errs
          suffShrink = sufficesShrinking
                         (fromMaybe emptyAllocSolution
                          . flip (tryAlloc opts nl' il') allocnodes)
--
2.5.0.rc2.392.g76e840b


LGTM

Reply via email to